code review (round 2) for memory commit failure fix (8013057)
Dmitry Samersoff
dmitry.samersoff at oracle.com
Tue Jun 18 02:46:02 PDT 2013
Dan,
Revised fix looks OK for me.
-Dmitry
On 2013-06-06 04:17, Daniel D. Daugherty wrote:
> Greetings,
>
> Since we're in a holding pattern with RT_Baseline, I've taken
> the opportunity to make the code style changes that Stefan
> requested.
>
> New webrev:
>
> http://cr.openjdk.java.net/~dcubed/8013057-webrev/3-hsx25/
>
> Stefan has already re-reviewed and approves. JPRT-hotspotwest job
> is in progress.
>
> Dan
>
>
>
> On 6/5/13 7:01 AM, Daniel D. Daugherty wrote:
>> Stefan,
>>
>> Thanks for the re-review! And for several off-line reviews
>> as we were trying to work out this 'round 2' version.
>>
>>
>> On 6/5/13 2:02 AM, Stefan Karlsson wrote:
>>> On 06/04/2013 07:32 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I have another revised version of the proposed fix for the following
>>>> bug:
>>>>
>>>> 8013057 assert(_needs_gc ||
>>>> SafepointSynchronize::is_at_safepoint())
>>>> failed: only read at safepoint
>>>>
>>>> Here are the (round 2) webrev URLs:
>>>>
>>>> OpenJDK: http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/
>>>> Internal:
>>>> http://javaweb.us.oracle.com/~ddaugher/8013057-webrev/2-hsx25/
>>>
>>> Looks good. Thanks for fixing this and making sure that this new code
>>> work well with the error handling in the GC code.
>>
>> Thanks! The new commit_memory_or_exit() cleaned up call sites in
>> several places so I appreciate your feedback.
>>
>>
>>> I have two code style comments that are not necessary to fix, but it
>>> would make me happy if they were.
>>>
>>>
>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/src/share/vm/runtime/os.hpp.udiff.html
>>>
>>>
>>> +// Executable parameter flag for os::commit_memory() and
>>> +// os::commit_memory_or_exit().
>>> +enum ExecMemFlag {
>>> + ExecMem = true
>>> +};
>>>
>>>
>>> Why is this an Enum and not a const bool when it's used as a bool?
>>
>> Hmmm... It's an Enum because that seemed to be style used in other places
>> in the VM when we didn't want to pass "true /* some comment */" as a
>> parameter. Do you happen to have a "const bool" example I could look at?
>> I'll poke around, but if you could point me at an example, I would
>> appreciate it...
>>
>>
>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/src/os/linux/vm/os_linux.cpp.udiff.html
>>>
>>>
>>> + // see if the error is one we can let the caller handle
>>> + switch (err) {
>>> + case EBADF:
>>> + case EINVAL:
>>> + case ENOTSUP:
>>> + // let the caller deal with these errors
>>> + break;
>>> +
>>> + default:
>>> + // Any remaining errors on this OS can cause our reserved mapping
>>> + // to be lost. That can cause confusion where different data
>>> + // structures think they have the same memory mapped. The worst
>>> + // scenario is if both the VM and a library think they have the
>>> + // same memory mapped.
>>> + warn_fail_commit_memory(addr, size, exec, err);
>>> + vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved
>>> memory.");
>>> + break;
>>> + }
>>>
>>> and
>>>
>>> + switch (err) {
>>> + case EBADF:
>>> + case EINVAL:
>>> + case ENOTSUP:
>>> + // these errors are OK for retrying with small pages
>>> + break;
>>> +
>>> + default:
>>> + // Any remaining errors on this OS can cause our reserved mapping
>>> + // to be lost. That can cause confusion where different data
>>> + // structures think they have the same memory mapped. The worst
>>> + // scenario is if both the VM and a library think they have the
>>> + // same memory mapped.
>>> + //
>>> + // However, it is not clear that this loss of our reserved
>>> mapping
>>> + // happens with large pages on Linux or that we cannot recover
>>> + // from the loss. For now, we just issue a warning and we don't
>>> + // call vm_exit_out_of_memory(). This issue is being tracked by
>>> + // JBS-8007074.
>>> + warn_fail_commit_memory(addr, size, alignment_hint, exec, err);
>>> +#if 0
>>> + vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>>> + "committing reserved memory.");
>>> +#endif
>>> + break;
>>> + }
>>>
>>> These construct are rather long and duplicated. I'd prefer if this
>>> could be extracted out into a helper function so that the logic of
>>> commit_memory_impl is a bit clearer.
>>
>> I agree that this could be improved. You had recommended this:
>>
>> if (!recoverable_error(err)) {
>> warn_fail_commit_memory(addr, size, exec, err);
>> vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved
>> memory.");
>> }
>>
>> and this:
>>
>> if (!recoverable_error(err)) {
>> warn_fail_commit_memory(addr, size, alignment_hint, exec, err);
>> #if 0
>> vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved
>> memory.");
>> #endif
>> }
>>
>> I concur that the identical parts can be refactored into
>> recoverable_error(), but the "However, it is not clear..."
>> paragraph needs to stay with disabled vm_exit_out_of_memory()
>> call. BTW, Coleen requested (off-thread) that the call be
>> commented out since of #if'ed out.
>>
>> I'll definitely make these changes if I have to re-roll the
>> patch for non-code style changes. Depending on how this
>> changeset lines up with RT_Baseline's schedule this week,
>> I may even make the changes anyway.
>>
>> Again, thanks for the many review cycles.
>>
>> Dan
>>
>>
>>>
>>> thanks,
>>> StefanK
>>>
>>>>
>>>> Testing:
>>>> - Aurora Adhoc vm.quick batch for all OSes in the following configs:
>>>> {Client VM, Server VM} x {fastdebug} x {-Xmixed}
>>>> - I've created a standalone Java stress test with a shell script
>>>> wrapper that reproduces the failing code paths on my Solaris X86
>>>> server and on Ron's DevOps Linux machine. This test will not be
>>>> integrated since running the machine out of swap space is very
>>>> disruptive (crashes the window system, causes various services to
>>>> exit, etc.)
>>>>
>>>> There are three parts to this fix:
>>>>
>>>> 1) Detect commit memory failures on Linux and Solaris where the
>>>> previous reservation can be lost and call vm_exit_out_of_memory()
>>>> to report the resource exhaustion. Add os::commit_memory_or_exit()
>>>> API to provide more consistent handling of vm_exit_out_of_memory()
>>>> calls.
>>>> 2) Change existing os::commit_memory() calls to make the executable
>>>> status of memory more clear; this makes security analysis easier.
>>>> 3) Clean up some platform dependent layer calls that were resulting
>>>> in extra NMT accounting. Clean up some mmap() return value checks.
>>>>
>>>> Gory details are below. As always, comments, questions and
>>>> suggestions are welome.
>>>>
>>>> Dan
>>>>
>>>>
>>>> Gory Details:
>>>>
>>>> The VirtualSpace data structure is built on top of the ReservedSpace
>>>> data structure. VirtualSpace presumes that failed os::commit_memory()
>>>> calls do not affect the underlying ReservedSpace memory mappings.
>>>> That assumption is true on MacOS X and Windows, but it is not true
>>>> on Linux or Solaris. The mmap() system call on Linux or Solaris can
>>>> lose previous mappings in the event of certain errors. On MacOS X,
>>>> the mmap() system call clearly states that previous mappings are
>>>> replaced only on success. On Windows, a different set of APIs are
>>>> used and they do not document any loss of previous mappings.
>>>>
>>>> The solution is to implement the proper failure checks in the
>>>> os::commit_memory() implementations on Linux and Solaris. On MacOS X
>>>> and Windows, no additional checks are needed.
>>>>
>>>> During code review round 1, there was a request from the GC team to
>>>> provide an os::commit_memory_or_exit() entry point in order to preserve
>>>> the existing error messages on all platforms. This entry point allows
>>>> code like this:
>>>>
>>>> src/share/vm/gc_implementation/parallelScavenge/cardTableExtension.cpp:
>>>>
>>>> 568 if (!os::commit_memory((char*)new_committed.start(),
>>>> 569 new_committed.byte_size())) {
>>>> 570 vm_exit_out_of_memory(new_committed.byte_size(),
>>>> OOM_MMAP_ERROR,
>>>> 571 "card table expansion");
>>>>
>>>> to be replaced with code like this:
>>>>
>>>> 568 os::commit_memory_or_exit((char*)new_committed.start(),
>>>> 569 new_committed.byte_size(), !ExecMem,
>>>> 570 "card table expansion");
>>>>
>>>> All uses of os::commit_memory() have been visited and those locations
>>>> that previously exited on error have been updated to use the new entry
>>>> point. This new entry point cleans up the original call sites and the
>>>> vm_exit_out_of_memory() calls are now consistent on all platforms.
>>>>
>>>> As a secondary change, while visiting all os::commit_memory() calls, I
>>>> also updated them to use the new ExecMem enum in order to make the
>>>> executable status of the memory more clear. Since executable memory can
>>>> be an attack vector, it is prudent to make the executable status of
>>>> memory crystal clear. This also allowed me to remove the default
>>>> executable flag value of 'false'. Now all new uses of commit_memory()
>>>> must be clear about the executable status of the memory.
>>>>
>>>> There are also tertiary changes where some of the pd_commit_memory()
>>>> calls were calling os::commit_memory() instead of calling their sibling
>>>> os::pd_commit_memory(). This resulted in double NMT tracking so this
>>>> has also been fixed. There were also some incorrect mmap)() return
>>>> value checks which have been fixed.
>>>>
>>>> Just to be clear: This fix simply properly detects the "out of swap
>>>> space" condition on Linux and Solaris and causes the VM to fail in a
>>>> more orderly fashion with a message that looks like this:
>>>>
>>>> The Java process' stderr will show:
>>>>
>>>> INFO: os::commit_memory(0xfffffd7fb2522000, 4096, 4096, 0) failed;
>>>> errno=11
>>>> #
>>>> # There is insufficient memory for the Java Runtime Environment to
>>>> continue.
>>>> # Native memory allocation (mmap) failed to map 4096 bytes for
>>>> committing reserved memory.
>>>> # An error report file with more information is saved as:
>>>> # /work/shared/bugs/8013057/looper.03/hs_err_pid9111.log
>>>>
>>>> The hs_err_pid file will have the more verbose info:
>>>>
>>>> #
>>>> # There is insufficient memory for the Java Runtime Environment to
>>>> continue.
>>>> # Native memory allocation (mmap) failed to map 4096 bytes for
>>>> committing reserved memory.
>>>> # Possible reasons:
>>>> # The system is out of physical RAM or swap space
>>>> # In 32 bit mode, the process size limit was hit
>>>> # Possible solutions:
>>>> # Reduce memory load on the system
>>>> # Increase physical memory or swap space
>>>> # Check if swap backing store is full
>>>> # Use 64 bit Java on a 64 bit OS
>>>> # Decrease Java heap size (-Xmx/-Xms)
>>>> # Decrease number of Java threads
>>>> # Decrease Java thread stack sizes (-Xss)
>>>> # Set larger code cache with -XX:ReservedCodeCacheSize=
>>>> # This output file may be truncated or incomplete.
>>>> #
>>>> # Out of Memory Error
>>>> (/work/shared/bug_hunt/hsx_rt_latest/exp_8013057/src/os/s
>>>> olaris/vm/os_solaris.cpp:2791), pid=9111, tid=21
>>>> #
>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b89) (build
>>>> 1.8.0-ea-b89)
>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM
>>>> (25.0-b33-bh_hsx_rt_exp_8013057_dcu
>>>> bed-product-fastdebug mixed mode solaris-amd64 compressed oops)
>>>> # Core dump written. Default location:
>>>> /work/shared/bugs/8013057/looper.03/core
>>>> or core.9111
>>>> #
>>>>
>>>> You might be wondering why we are assuming that the failed mmap()
>>>> commit operation has lost the 'reserved memory' mapping.
>>>>
>>>> We have no good way to determine if the 'reserved memory' mapping
>>>> is lost. Since all the other threads are not idle, it is possible
>>>> for another thread to have 'reserved' the same memory space for a
>>>> different data structure. Our thread could observe that the memory
>>>> is still 'reserved' but we have no way to know that the reservation
>>>> isn't ours.
>>>>
>>>> You might be wondering why we can't recover from this transient
>>>> resource availability issue.
>>>>
>>>> We could retry the failed mmap() commit operation, but we would
>>>> again run into the issue that we no longer know which data
>>>> structure 'owns' the 'reserved' memory mapping. In particular, the
>>>> memory could be reserved by native code calling mmap() directly so
>>>> the VM really has no way to recover from this failure.
>>>>
>>>> You might be wondering why part of his work is deferred:
>>>>
>>>> 2654 default:
>>>> 2655 // Any remaining errors on this OS can cause our reserved
>>>> mapping
>>>> 2656 // to be lost. That can cause confusion where different data
>>>> 2657 // structures think they have the same memory mapped. The
>>>> worst
>>>> 2658 // scenario is if both the VM and a library think they
>>>> have the
>>>> 2659 // same memory mapped.
>>>> 2660 //
>>>> 2661 // However, it is not clear that this loss of our
>>>> reserved mapping
>>>> 2662 // happens with large pages on Linux or that we cannot
>>>> recover
>>>> 2663 // from the loss. For now, we just issue a warning and we
>>>> don't
>>>> 2664 // call vm_exit_out_of_memory(). This issue is being
>>>> tracked by
>>>> 2665 // JBS-8007074.
>>>> 2666 warn_fail_commit_memory(addr, size, alignment_hint, exec,
>>>> err);
>>>> 2667 #if 0
>>>> 2668 vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>>>> 2669 "committing reserved memory.");
>>>> 2670 #endif
>>>> 2671 break;
>>>>
>>>> When lines 2668-2669 are enabled and UseHugeTLBFS is specified,
>>>> then the VM will exit because no more huge/large pages are
>>>> available. It is not yet clear that this transition from large to
>>>> small pages is actually unsafe, but we don't yet have proof that
>>>> it is safe either. More research will be done via JBS-8007074.
>>>>
>>>
>>
>>
>>
>
--
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the source code.
More information about the hotspot-runtime-dev
mailing list