code review (round 2) for memory commit failure fix (8013057)

Fri Jun 7 06:36:51 PDT 2013

Thanks for your understanding!

Dan

On 6/7/13 1:47 AM, David Holmes wrote:
> Thanks Dan. Understood re error handling.
>
> I don't think we need the new _impl method to be in os::XXX but I 
> won't force you make any further changes. (Need to keep you on side 
> for a FDS issue ;-) )
>
> Cheers,
> David
>
> On 7/06/2013 12:49 PM, Daniel D. Daugherty wrote:
>> David,
>>
>> Thanks for the re-review. Replies are embedded below.
>>
>>
>> On 6/6/13 8:05 PM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> On 6/06/2013 10:17 AM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> Since we're in a holding pattern with RT_Baseline, I've taken
>>>> the opportunity to make the code style changes that Stefan
>>>> requested.
>>>>
>>>> New webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/3-hsx25/
>>>>
>>>> Stefan has already re-reviewed and approves. JPRT-hotspotwest job
>>>> is in progress.
>>>
>>> For the record I find it somewhat confusing to have these different
>>> failure semantics depending on the platform. It might simplify both
>>> the pd code and the shared code if any mmap failure were simply a
>>> fatal error. But we're not taking that path so ...
>>>
>>> I don't really understand the "recoverable mmap error" logic:
>>>
>>> 2557   case EBADF:
>>> 2558   case EINVAL:
>>> 2559   case ENOTSUP:
>>>
>>> these are, we hope, errors caused by failed pre-conditions and so we
>>> expect that at the point these issues are detected the mmap has been a
>>> no-op and so no reservation has been lost. Fair enough.
>>
>> Prior this this fix, we called os::commit_memory() and it came back
>> with true or false and the caller "did something" with that failure
>> mode. The idea is that the upper layers have some idea about the
>> overall semantics of what they are trying to do and are in the best
>> position to decide whether to fail or retry later.
>>
>> Along comes our realization that the underlying mmap() API doesn't
>> quite behave the way we expect on all OSes. If all the OSes were
>> equally screwed up, we would fix this a different way. However, it
>> is just Linux and Solaris that screw this up so we limit the new
>> vm_exit_out_of_memory() calls for this mmap() failure to those two
>> OSes.
>>
>> Any failures that make it back from os::commit_memory() to the upper
>> layers will continue to be handled at those upper layers the way they
>> were handled before. I don't think we're ready to completely rewrite
>> the whole reserve_memory/commit_memory model. That would be a bit
>> much since this has to be backported.
>>
>>> But to me these indicate programming errors in our code so why not
>>> fail-fast anyway?
>>
>> Maybe they are programming errors and maybe they are crazy 
>> commit_memory()
>> probes in an attempt to adapt to some page size thing. I can't say that
>> I understand all the reasons that the above three errnos are returned
>> nor do I need to for this bug fix. What I do know is that these three
>> errnos were permitted to trickle back before and it appears to be safe
>> to continue to let them do so since they won't mess up existing 
>> mappings.
>>
>>
>>> If these occur we are going to lose the detailed information by
>>> returning false and this may not even trigger an upper layer failure
>>> allowing a programming error to go undetected.
>>
>> Sorry, this is no different than we behaved before so I'm not planning
>> to worry about that for this fix. As I said above, we're not rewriting
>> the whole reserve_memory/commit_memory model.
>>
>>
>>> Is there a reason that:
>>>
>>> int os::Linux::commit_memory_impl
>>>
>>> is not simply simply
>>>
>>> static int commit_memory_impl? It seems to be purely internal and so
>>> doesn't need to be part of the os::Linux exported API. Ditto for the
>>> Solaris version.
>>
>> Seemed like the place to put it since there are other similar
>> peers at the os::Linux level. I used the 'static' function for
>> the things that were strictly helper functions. Also seemed
>> like the thing to do.
>>
>>
>>
>>> ---
>>>
>>> Otherwise ok. This is messy issue, thanks for taking the time to try
>>> and make it cleaner.
>>
>> Thanks! I'm not planning to make any changes based on your review.
>> Please let me know if you're OK with that.
>>
>> Credit for making it cleaner goes to Stefan K. He kept after me
>> readability of this code and not overloading too many things into
>> one construct.
>>
>> Dan
>>
>>
>>>
>>> Cheers,
>>> David
>>> -----
>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>> On 6/5/13 7:01 AM, Daniel D. Daugherty wrote:
>>>>> Stefan,
>>>>>
>>>>> Thanks for the re-review! And for several off-line reviews
>>>>> as we were trying to work out this 'round 2' version.
>>>>>
>>>>>
>>>>> On 6/5/13 2:02 AM, Stefan Karlsson wrote:
>>>>>> On 06/04/2013 07:32 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I have another revised version of the proposed fix for the 
>>>>>>> following
>>>>>>> bug:
>>>>>>>
>>>>>>>     8013057 assert(_needs_gc ||
>>>>>>> SafepointSynchronize::is_at_safepoint())
>>>>>>>             failed: only read at safepoint
>>>>>>>
>>>>>>> Here are the (round 2) webrev URLs:
>>>>>>>
>>>>>>> OpenJDK: http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/
>>>>>>> Internal:
>>>>>>> http://javaweb.us.oracle.com/~ddaugher/8013057-webrev/2-hsx25/
>>>>>>
>>>>>> Looks good. Thanks for fixing this and making sure that this new 
>>>>>> code
>>>>>> work well with the error handling in the GC code.
>>>>>
>>>>> Thanks! The new commit_memory_or_exit() cleaned up call sites in
>>>>> several places so I appreciate your feedback.
>>>>>
>>>>>
>>>>>> I have two code style comments that are not necessary to fix, but it
>>>>>> would make me happy if they were.
>>>>>>
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/src/share/vm/runtime/os.hpp.udiff.html 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> +// Executable parameter flag for os::commit_memory() and
>>>>>> +// os::commit_memory_or_exit().
>>>>>> +enum ExecMemFlag {
>>>>>> +  ExecMem = true
>>>>>> +};
>>>>>>
>>>>>>
>>>>>> Why is this an Enum and not a const bool when it's used as a bool?
>>>>>
>>>>> Hmmm... It's an Enum because that seemed to be style used in other
>>>>> places
>>>>> in the VM when we didn't want to pass "true /* some comment */" as a
>>>>> parameter. Do you happen to have a "const bool" example I could look
>>>>> at?
>>>>> I'll poke around, but if you could point me at an example, I would
>>>>> appreciate it...
>>>>>
>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/src/os/linux/vm/os_linux.cpp.udiff.html 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> +  // see if the error is one we can let the caller handle
>>>>>> +  switch (err) {
>>>>>> +  case EBADF:
>>>>>> +  case EINVAL:
>>>>>> +  case ENOTSUP:
>>>>>> +    // let the caller deal with these errors
>>>>>> +    break;
>>>>>> +
>>>>>> +  default:
>>>>>> +    // Any remaining errors on this OS can cause our reserved 
>>>>>> mapping
>>>>>> +    // to be lost. That can cause confusion where different data
>>>>>> +    // structures think they have the same memory mapped. The worst
>>>>>> +    // scenario is if both the VM and a library think they have the
>>>>>> +    // same memory mapped.
>>>>>> +    warn_fail_commit_memory(addr, size, exec, err);
>>>>>> +    vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing 
>>>>>> reserved
>>>>>> memory.");
>>>>>> +    break;
>>>>>> +  }
>>>>>>
>>>>>> and
>>>>>>
>>>>>> +    switch (err) {
>>>>>> +    case EBADF:
>>>>>> +    case EINVAL:
>>>>>> +    case ENOTSUP:
>>>>>> +      // these errors are OK for retrying with small pages
>>>>>> +      break;
>>>>>> +
>>>>>> +    default:
>>>>>> +      // Any remaining errors on this OS can cause our reserved
>>>>>> mapping
>>>>>> +      // to be lost. That can cause confusion where different data
>>>>>> +      // structures think they have the same memory mapped. The 
>>>>>> worst
>>>>>> +      // scenario is if both the VM and a library think they 
>>>>>> have the
>>>>>> +      // same memory mapped.
>>>>>> +      //
>>>>>> +      // However, it is not clear that this loss of our reserved
>>>>>> mapping
>>>>>> +      // happens with large pages on Linux or that we cannot 
>>>>>> recover
>>>>>> +      // from the loss. For now, we just issue a warning and we 
>>>>>> don't
>>>>>> +      // call vm_exit_out_of_memory(). This issue is being 
>>>>>> tracked by
>>>>>> +      // JBS-8007074.
>>>>>> +      warn_fail_commit_memory(addr, size, alignment_hint, exec, 
>>>>>> err);
>>>>>> +#if 0
>>>>>> +      vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>>>>>> +                            "committing reserved memory.");
>>>>>> +#endif
>>>>>> +      break;
>>>>>> +    }
>>>>>>
>>>>>> These construct are rather long and duplicated. I'd prefer if this
>>>>>> could be extracted out into a helper function so that the logic of
>>>>>> commit_memory_impl is a bit clearer.
>>>>>
>>>>> I agree that this could be improved. You had recommended this:
>>>>>
>>>>> if (!recoverable_error(err)) {
>>>>>   warn_fail_commit_memory(addr, size, exec, err);
>>>>>   vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved
>>>>> memory.");
>>>>> }
>>>>>
>>>>> and this:
>>>>>
>>>>> if (!recoverable_error(err)) {
>>>>>   warn_fail_commit_memory(addr, size, alignment_hint, exec, err);
>>>>> #if 0
>>>>>   vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved
>>>>> memory.");
>>>>> #endif
>>>>> }
>>>>>
>>>>> I concur that the identical parts can be refactored into
>>>>> recoverable_error(), but the "However, it is not clear..."
>>>>> paragraph needs to stay with disabled vm_exit_out_of_memory()
>>>>> call. BTW, Coleen requested (off-thread) that the call be
>>>>> commented out since of #if'ed out.
>>>>>
>>>>> I'll definitely make these changes if I have to re-roll the
>>>>> patch for non-code style changes. Depending on how this
>>>>> changeset lines up with RT_Baseline's schedule this week,
>>>>> I may even make the changes anyway.
>>>>>
>>>>> Again, thanks for the many review cycles.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>>
>>>>>> thanks,
>>>>>> StefanK
>>>>>>
>>>>>>>
>>>>>>> Testing:
>>>>>>> - Aurora Adhoc vm.quick batch for all OSes in the following 
>>>>>>> configs:
>>>>>>>   {Client VM, Server VM} x {fastdebug} x {-Xmixed}
>>>>>>> - I've created a standalone Java stress test with a shell script
>>>>>>>   wrapper that reproduces the failing code paths on my Solaris X86
>>>>>>>   server and on Ron's DevOps Linux machine. This test will not be
>>>>>>>   integrated since running the machine out of swap space is very
>>>>>>>   disruptive (crashes the window system, causes various services to
>>>>>>>   exit, etc.)
>>>>>>>
>>>>>>> There are three parts to this fix:
>>>>>>>
>>>>>>> 1) Detect commit memory failures on Linux and Solaris where the
>>>>>>>    previous reservation can be lost and call 
>>>>>>> vm_exit_out_of_memory()
>>>>>>>    to report the resource exhaustion. Add 
>>>>>>> os::commit_memory_or_exit()
>>>>>>>    API to provide more consistent handling of 
>>>>>>> vm_exit_out_of_memory()
>>>>>>>    calls.
>>>>>>> 2) Change existing os::commit_memory() calls to make the executable
>>>>>>>    status of memory more clear; this makes security analysis 
>>>>>>> easier.
>>>>>>> 3) Clean up some platform dependent layer calls that were resulting
>>>>>>>    in extra NMT accounting. Clean up some mmap() return value 
>>>>>>> checks.
>>>>>>>
>>>>>>> Gory details are below. As always, comments, questions and
>>>>>>> suggestions are welome.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> Gory Details:
>>>>>>>
>>>>>>> The VirtualSpace data structure is built on top of the 
>>>>>>> ReservedSpace
>>>>>>> data structure. VirtualSpace presumes that failed 
>>>>>>> os::commit_memory()
>>>>>>> calls do not affect the underlying ReservedSpace memory mappings.
>>>>>>> That assumption is true on MacOS X and Windows, but it is not true
>>>>>>> on Linux or Solaris. The mmap() system call on Linux or Solaris can
>>>>>>> lose previous mappings in the event of certain errors. On MacOS X,
>>>>>>> the mmap() system call clearly states that previous mappings are
>>>>>>> replaced only on success. On Windows, a different set of APIs are
>>>>>>> used and they do not document any loss of previous mappings.
>>>>>>>
>>>>>>> The solution is to implement the proper failure checks in the
>>>>>>> os::commit_memory() implementations on Linux and Solaris. On 
>>>>>>> MacOS X
>>>>>>> and Windows, no additional checks are needed.
>>>>>>>
>>>>>>> During code review round 1, there was a request from the GC team to
>>>>>>> provide an os::commit_memory_or_exit() entry point in order to
>>>>>>> preserve
>>>>>>> the existing error messages on all platforms. This entry point 
>>>>>>> allows
>>>>>>> code like this:
>>>>>>>
>>>>>>> src/share/vm/gc_implementation/parallelScavenge/cardTableExtension.cpp: 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  568       if (!os::commit_memory((char*)new_committed.start(),
>>>>>>>  569 new_committed.byte_size())) {
>>>>>>>  570 vm_exit_out_of_memory(new_committed.byte_size(),
>>>>>>> OOM_MMAP_ERROR,
>>>>>>>  571                               "card table expansion");
>>>>>>>
>>>>>>> to be replaced with code like this:
>>>>>>>
>>>>>>>  568 os::commit_memory_or_exit((char*)new_committed.start(),
>>>>>>>  569 new_committed.byte_size(), !ExecMem,
>>>>>>>  570                                 "card table expansion");
>>>>>>>
>>>>>>> All uses of os::commit_memory() have been visited and those 
>>>>>>> locations
>>>>>>> that previously exited on error have been updated to use the new
>>>>>>> entry
>>>>>>> point. This new entry point cleans up the original call sites 
>>>>>>> and the
>>>>>>> vm_exit_out_of_memory() calls are now consistent on all platforms.
>>>>>>>
>>>>>>> As a secondary change, while visiting all os::commit_memory()
>>>>>>> calls, I
>>>>>>> also updated them to use the new ExecMem enum in order to make the
>>>>>>> executable status of the memory more clear. Since executable
>>>>>>> memory can
>>>>>>> be an attack vector, it is prudent to make the executable status of
>>>>>>> memory crystal clear. This also allowed me to remove the default
>>>>>>> executable flag value of 'false'. Now all new uses of 
>>>>>>> commit_memory()
>>>>>>> must be clear about the executable status of the memory.
>>>>>>>
>>>>>>> There are also tertiary changes where some of the 
>>>>>>> pd_commit_memory()
>>>>>>> calls were calling os::commit_memory() instead of calling their
>>>>>>> sibling
>>>>>>> os::pd_commit_memory(). This resulted in double NMT tracking so 
>>>>>>> this
>>>>>>> has also been fixed. There were also some incorrect mmap)() return
>>>>>>> value checks which have been fixed.
>>>>>>>
>>>>>>> Just to be clear: This fix simply properly detects the "out of swap
>>>>>>> space" condition on Linux and Solaris and causes the VM to fail 
>>>>>>> in a
>>>>>>> more orderly fashion with a message that looks like this:
>>>>>>>
>>>>>>> The Java process' stderr will show:
>>>>>>>
>>>>>>> INFO: os::commit_memory(0xfffffd7fb2522000, 4096, 4096, 0) failed;
>>>>>>> errno=11
>>>>>>> #
>>>>>>> # There is insufficient memory for the Java Runtime Environment to
>>>>>>> continue.
>>>>>>> # Native memory allocation (mmap) failed to map 4096 bytes for
>>>>>>> committing reserved memory.
>>>>>>> # An error report file with more information is saved as:
>>>>>>> # /work/shared/bugs/8013057/looper.03/hs_err_pid9111.log
>>>>>>>
>>>>>>> The hs_err_pid file will have the more verbose info:
>>>>>>>
>>>>>>> #
>>>>>>> # There is insufficient memory for the Java Runtime Environment to
>>>>>>> continue.
>>>>>>> # Native memory allocation (mmap) failed to map 4096 bytes for
>>>>>>> committing reserved memory.
>>>>>>> # Possible reasons:
>>>>>>> #   The system is out of physical RAM or swap space
>>>>>>> #   In 32 bit mode, the process size limit was hit
>>>>>>> # Possible solutions:
>>>>>>> #   Reduce memory load on the system
>>>>>>> #   Increase physical memory or swap space
>>>>>>> #   Check if swap backing store is full
>>>>>>> #   Use 64 bit Java on a 64 bit OS
>>>>>>> #   Decrease Java heap size (-Xmx/-Xms)
>>>>>>> #   Decrease number of Java threads
>>>>>>> #   Decrease Java thread stack sizes (-Xss)
>>>>>>> #   Set larger code cache with -XX:ReservedCodeCacheSize=
>>>>>>> # This output file may be truncated or incomplete.
>>>>>>> #
>>>>>>> #  Out of Memory Error
>>>>>>> (/work/shared/bug_hunt/hsx_rt_latest/exp_8013057/src/os/s
>>>>>>> olaris/vm/os_solaris.cpp:2791), pid=9111, tid=21
>>>>>>> #
>>>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b89) (build
>>>>>>> 1.8.0-ea-b89)
>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM
>>>>>>> (25.0-b33-bh_hsx_rt_exp_8013057_dcu
>>>>>>> bed-product-fastdebug mixed mode solaris-amd64 compressed oops)
>>>>>>> # Core dump written. Default location:
>>>>>>> /work/shared/bugs/8013057/looper.03/core
>>>>>>> or core.9111
>>>>>>> #
>>>>>>>
>>>>>>> You might be wondering why we are assuming that the failed mmap()
>>>>>>> commit operation has lost the 'reserved memory' mapping.
>>>>>>>
>>>>>>>     We have no good way to determine if the 'reserved memory' 
>>>>>>> mapping
>>>>>>>     is lost. Since all the other threads are not idle, it is 
>>>>>>> possible
>>>>>>>     for another thread to have 'reserved' the same memory space 
>>>>>>> for a
>>>>>>>     different data structure. Our thread could observe that the
>>>>>>> memory
>>>>>>>     is still 'reserved' but we have no way to know that the
>>>>>>> reservation
>>>>>>>     isn't ours.
>>>>>>>
>>>>>>> You might be wondering why we can't recover from this transient
>>>>>>> resource availability issue.
>>>>>>>
>>>>>>>     We could retry the failed mmap() commit operation, but we would
>>>>>>>     again run into the issue that we no longer know which data
>>>>>>>     structure 'owns' the 'reserved' memory mapping. In particular,
>>>>>>> the
>>>>>>>     memory could be reserved by native code calling mmap()
>>>>>>> directly so
>>>>>>>     the VM really has no way to recover from this failure.
>>>>>>>
>>>>>>> You might be wondering why part of his work is deferred:
>>>>>>>
>>>>>>> 2654     default:
>>>>>>> 2655       // Any remaining errors on this OS can cause our 
>>>>>>> reserved
>>>>>>> mapping
>>>>>>> 2656       // to be lost. That can cause confusion where different
>>>>>>> data
>>>>>>> 2657       // structures think they have the same memory mapped. 
>>>>>>> The
>>>>>>> worst
>>>>>>> 2658       // scenario is if both the VM and a library think they
>>>>>>> have the
>>>>>>> 2659       // same memory mapped.
>>>>>>> 2660       //
>>>>>>> 2661       // However, it is not clear that this loss of our
>>>>>>> reserved mapping
>>>>>>> 2662       // happens with large pages on Linux or that we cannot
>>>>>>> recover
>>>>>>> 2663       // from the loss. For now, we just issue a warning 
>>>>>>> and we
>>>>>>> don't
>>>>>>> 2664       // call vm_exit_out_of_memory(). This issue is being
>>>>>>> tracked by
>>>>>>> 2665       // JBS-8007074.
>>>>>>> 2666       warn_fail_commit_memory(addr, size, alignment_hint, 
>>>>>>> exec,
>>>>>>> err);
>>>>>>> 2667 #if 0
>>>>>>> 2668       vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>>>>>>> 2669                             "committing reserved memory.");
>>>>>>> 2670 #endif
>>>>>>> 2671       break;
>>>>>>>
>>>>>>>     When lines 2668-2669 are enabled and UseHugeTLBFS is specified,
>>>>>>>     then the VM will exit because no more huge/large pages are
>>>>>>>     available. It is not yet clear that this transition from 
>>>>>>> large to
>>>>>>>     small pages is actually unsafe, but we don't yet have proof 
>>>>>>> that
>>>>>>>     it is safe either. More research will be done via JBS-8007074.
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>