code review (round 1) for memory commit failure fix (8013057)
Stefan Karlsson
stefan.karlsson at oracle.com
Tue May 28 14:51:44 PDT 2013
On 5/28/13 11:03 PM, Daniel D. Daugherty wrote:
> On 5/28/13 2:47 PM, Stefan Karlsson wrote:
>> On 5/28/13 10:08 PM, Daniel D. Daugherty wrote:
>>> Stefan,
>>>
>>> Thanks for getting back to me! Replies embedded below...
>>>
>>>
>>> On 5/28/13 1:02 PM, Stefan Karlsson wrote:
>>>> On 5/28/13 5:55 PM, Daniel D. Daugherty wrote:
>>>>> Stefan,
>>>>>
>>>>> Thanks for the re-review! Replies embedded below.
>>>>>
>>>>>
>>>>> On 5/28/13 2:56 AM, Stefan Karlsson wrote:
>>>>>> On 05/24/2013 08:23 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I have a revised version of the proposed fix for the following bug:
>>>>>>>
>>>>>>> 8013057 assert(_needs_gc ||
>>>>>>> SafepointSynchronize::is_at_safepoint())
>>>>>>> failed: only read at safepoint
>>>>>>>
>>>>>>> Here are the (round 1) webrev URLs:
>>>>>>>
>>>>>>> OpenJDK: http://cr.openjdk.java.net/~dcubed/8013057-webrev/1-hsx25/
>>>>>>> Internal:
>>>>>>> http://javaweb.us.oracle.com/~ddaugher/8013057-webrev/1-hsx25/
>>>>>>
>>>>>> Your patch exits the VM if we fail to get memory in
>>>>>> os::commit_memory. There are a couple of places where we already
>>>>>> have error messages, what should we do about them? For example:
>>>>>> if (!os::commit_memory((char*)guard_page, _page_size,
>>>>>> _page_size)) {
>>>>>> // Do better than this for Merlin
>>>>>> vm_exit_out_of_memory(_page_size, OOM_MMAP_ERROR, "card table
>>>>>> last card");
>>>>>> }
>>>>>
>>>>> Actually, my patch only exits the VM on Linux and Solaris only for
>>>>> certain mmap() error code values so we shouldn't do anything with
>>>>> existing code that exits on os::commit_memory() failures.
>>>>>
>>>>> For the specific example above, it is in platform independent code in
>>>>> src/share/vm/memory/cardTableModRefBS.cpp so we definitely don't want
>>>>> to remove that check. That would be bad for non-Linux and non-Solaris
>>>>> platforms.
>>>>
>>>> I talked to the GC team today, and we don't want to loose the hints
>>>> that tell what the memory is allocated for. The code you're
>>>> suggesting will report the following, for the example above, on
>>>> Linux/Solaris:
>>>>
>>>> + warning("INFO: os::commit_memory(" PTR_FORMAT ", " SIZE_FORMAT
>>>> + ", " SIZE_FORMAT ", %d) failed; errno=%d", addr, size,
>>>> + alignment_hint, exec, err);
>>>> + vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>>>> + "committing reserved memory.");
>>>>
>>>> and for the other platforms:
>>>>
>>>> vm_exit_out_of_memory(_page_size, OOM_MMAP_ERROR, "card table
>>>> last card");
>>>>
>>>> We'd like to get the"card table last card" string into the Linux/Solaris error message, if possible.
>>>
>>> Well, it is software so just about anything is possible... :-)
>>>
>>> How about a new optional parameter to os::commit_memory() where an
>>> "alt_mesg" can be passed in? If alt_mesg != NULL (and we're on a
>>> platform that call vm_exit_out_of_memory() the platform dependent
>>> code), we'll use the alt_mesg instead of "committing reserved memory".
>>> Does that sound acceptable?
>>>
>>> Of course, I'll have to revisit and potentially update many more calls
>>> to os::commit_memory().
>> I need to think about that. I don't think it look that nice if we
>> duplicate that string like:
>> if (!os::commit_memory(..., "card table last card")) {
>> vm_exit_out_of_memory(_page_size, OOM_MMAP_ERROR, "card table last
>> card");
>> }
>
> Could solve that with a local string...
>
>
>> Maybe we could have a:
>> os::commit_memory_or_exit(..., "card table last card")
>
> So we would have both commit_memory() and commit_memory_or_exit()?
> Not liking that... Also that would require more new code on MacOS X
> and Windows that we don't currently need.
>
But now you have two vm_exit_out_of_memory calls in the path and you
need to figure out which one is going to be called. I'm not liking that
either.
I'm sure you could make commit_memory_or_exit a thin wrapper around
os::commit_memory.
os::commit_memory_or_exit(..., msg) {
if (!os::commit_memory(..., no_exit_on_OOM)) {
vm_exit_out_of_memory(...)
}
}
or whatever that makes the caller code a bit cleaner.
>
>>
>>>
>>>
>>>>
>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/1-hsx25/src/os/linux/vm/os_linux.cpp.frames.html
>>>>>>
>>>>>> Lines 2564-2578:
>>>>>>
>>>>>> I don't think we want to exit immediately if we can't get large
>>>>>> pages. With this change JVMs will stop to run if to few large
>>>>>> pages have been setup. I think this will affect a lot of users.
>>>>>>
>>>>>> This code path should probably be handled by:
>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007074
>>>>>
>>>>> I suppose that depends on whether the failure to get large pages
>>>>> causes the underlying reservation mapping to get lost.
>>>> Yes, we'll loose the reservation but we'll get it back a couple of
>>>> rows bellow in os::commit_memory. That's why 8007074 wasn't found
>>>> immediately.
>>>
>>> Ummmmm. As soon as the reservation is lost, then something else in
>>> the same process can come along a mmap() that memory... It could be
>>> a native library over which you have no control...
>>
>> I agree about the dangers with this code. I'm just concerned that
>> this patch will shutdown JVMs on machines that end using up all large
>> pages or get their large pages fragmented. This will probably make
>> the default Large Pages implementation in HotSpot unusable on Linux.
>
> Hopefully more testing will help me understand more about how things
> work on Linux. Right now, I'm having trouble just getting to the
> regular failure point let alone a large pages failure...
>
>
>>
>>>
>>>
>>>>
>>>>> It also
>>>>> depends on the errno value when large page mmap() fails.
>>>>
>>>> ENOMEM
>>>>
>>>>>
>>>>> If you prefer, I can back out lines 2606-2621 in the following
>>>>> function:
>>>>>
>>>>> 2592 bool os::pd_commit_memory(char* addr, size_t size, size_t alignment_hint,
>>>>> 2593 bool exec) {
>>>>> 2594 if (UseHugeTLBFS && alignment_hint > (size_t)vm_page_size()) {
>>>>> 2595 int prot = exec ? PROT_READ|PROT_WRITE|PROT_EXEC : PROT_READ|PROT_WRITE;
>>>>> 2596 uintptr_t res =
>>>>> 2597 (uintptr_t) ::mmap(addr, size, prot,
>>>>> 2598 MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_HUGETLB,
>>>>> 2599 -1, 0);
>>>>> 2600 if (res != (uintptr_t) MAP_FAILED) {
>>>>> 2601 if (UseNUMAInterleaving) {
>>>>> 2602 numa_make_global(addr, size);
>>>>> 2603 }
>>>>> 2604 return true;
>>>>> 2605 }
>>>>> 2606
>>>>> 2607 int err = errno; // save errno from mmap() call above
>>>>> 2608 switch (err) {
>>>>> 2609 case EBADF:
>>>>> 2610 case EINVAL:
>>>>> 2611 case ENOTSUP:
>>>>> 2612 break;
>>>>> 2613
>>>>> 2614 default:
>>>>> 2615 warning("INFO: os::commit_memory(" PTR_FORMAT ", " SIZE_FORMAT
>>>>> 2616 ", " SIZE_FORMAT ", %d) failed; errno=%d", addr, size,
>>>>> 2617 alignment_hint, exec, err);
>>>>> 2618 vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>>>>> 2619 "committing reserved memory.");
>>>>> 2620 break;
>>>>> 2621 }
>>>>> 2622 // Fall through and try to use small pages
>>>>> 2623 }
>>>>> 2624
>>>>> 2625 if (pd_commit_memory(addr, size, exec)) {
>>>>> 2626 realign_memory(addr, size, alignment_hint);
>>>>> 2627 return true;
>>>>> 2628 }
>>>>> 2629 return false;
>>>>> 2630 }
>>>>> and then I can leave that potential location for your work on 8007074.
>>>>
>>>> I think that would be good.
>>>
>>> Based on your reply above that the reservation can be lost, now
>>> I need to be convinced why you think it is OK to leave it for
>>> now...
>>
>> I just see that we have two bad solutions at the moment: one that
>> exits the JVM and the other that _might_ crash the JVM. I think
>> neither is satisfactory for the users.
>
> The current situation is worse. What happens if the transient out of
> memory condition doesn't crash the JVM, it just corrupts that JVM in
> subtle ways... and the JVM just propagates the corrupted data.... :-(
If you think this is the correct way forward, could you do this as patch
for 8007074 and handle politics around doing this?
Just to be clear. Doing the following:
$ echo 1 | sudo tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
will most likely cause all subsequent JVM invocations to fail at startup
with this patch.
thanks,
StefanK
> Dan
>
>
>>
>> StefanK
>>>
>>>
>>>>
>>>>>
>>>>> Before I back out the code, I would be interested in exercising it
>>>>> in a large page config on a Linux machine. Can you give me some info
>>>>> about how to enable large pages on Linux?
>>>>
>>>> I have the instructions in the bug report. Contact me if that
>>>> doesn't work.
>>>
>>> I thought of that after I sent my reply this morning. I'm doing
>>> basic testing on Ron D's Linux machine right now. Large pages
>>> will be next...
>>>
>>> Dan
>>>
>>>
>>>>
>>>>>
>>>>> I'll have to see about getting access to a Linux machine. I don't
>>>>> have one of those in my lab. (More on that below)
>>>>>
>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/1-hsx25/src/os/linux/vm/os_linux.cpp.frames.html
>>>>>>
>>>>>> Lines 2614-2620:
>>>>>>
>>>>>> Do we ever end up failing like this on Linux? Have you been able
>>>>>> to reproduce this on Linux?
>>>>>
>>>>> According to this bug:
>>>>>
>>>>> JDK-6843484 os::commit_memory() failures are not handled
>>>>> properly on linux
>>>>> https://jbs.oracle.com/bugs/browse/JDK-6843484
>>>>>
>>>>> we do run into this issue on Linux. However, I have not tried my
>>>>> reproducer on a Linux machine. I'm a bit worried about doing that
>>>>> since I swamped my local Solaris server so bad that I had to power
>>>>> cycle it Friday night.
>>>>>
>>>>> I will investigate getting access to a remote Linux machine and I'll
>>>>> check into how they get rebooted, get unstuck, etc... I'm worried
>>>>> about screwing up someone else's machine so I'll see about getting
>>>>> my own...
>>>>
>>>> OK.
>>>>
>>>> StefanK
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> StefanK
>>>>>>
>>>>>>>
>>>>>>> Testing:
>>>>>>> - Aurora Adhoc vm.quick batch for all OSes in the following
>>>>>>> configs:
>>>>>>> {Client VM, Server VM} x {fastdebug} x {-Xmixed}
>>>>>>> - I've created a standalone Java stress test with a shell script
>>>>>>> wrapper that reproduces the failing code paths on my Solaris X86
>>>>>>> server. This test will not be integrated since running the
>>>>>>> machine
>>>>>>> out of swap space is very disruptive (crashes the window system,
>>>>>>> causes various services to exit, etc.)
>>>>>>>
>>>>>>> Gory details are below. As always, comments, questions and
>>>>>>> suggestions are welome.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> Gory Details:
>>>>>>>
>>>>>>> The VirtualSpace data structure is built on top of the
>>>>>>> ReservedSpace
>>>>>>> data structure. VirtualSpace presumes that failed
>>>>>>> os::commit_memory()
>>>>>>> calls do not affect the underlying ReservedSpace memory mappings.
>>>>>>> That assumption is true on MacOS X and Windows, but it is not true
>>>>>>> on Linux or Solaris. The mmap() system call on Linux or Solaris can
>>>>>>> lose previous mappings in the event of certain errors. On MacOS X,
>>>>>>> the mmap() system call clearly states that previous mappings are
>>>>>>> replaced only on success. On Windows, a different set of APIs are
>>>>>>> used and they do not document any loss of previous mappings.
>>>>>>>
>>>>>>> The solution is to implement the proper failure checks in the
>>>>>>> os::commit_memory() implementations on Linux and Solaris. On
>>>>>>> MacOS X
>>>>>>> and Windows, no additional checks are needed.
>>>>>>>
>>>>>>> There is also a secondary change where some of the
>>>>>>> pd_commit_memory()
>>>>>>> calls were calling os::commit_memory() instead of calling their
>>>>>>> sibling
>>>>>>> os::pd_commit_memory(). This resulted in double NMT tracking so
>>>>>>> this
>>>>>>> has also been fixed. There were also some incorrect mmap)() return
>>>>>>> value checks which have been fixed.
>>>>>>>
>>>>>>> Just to be clear: This fix simply properly detects the "out of swap
>>>>>>> space" condition on Linux and Solaris and causes the VM to fail
>>>>>>> in a
>>>>>>> more orderly fashion with a message that looks like this:
>>>>>>>
>>>>>>> The Java process' stderr will show:
>>>>>>>
>>>>>>> INFO: os::commit_memory(0xfffffd7fb2522000, 4096, 4096, 0)
>>>>>>> failed; errno=11
>>>>>>> #
>>>>>>> # There is insufficient memory for the Java Runtime Environment
>>>>>>> to continue.
>>>>>>> # Native memory allocation (mmap) failed to map 4096 bytes for
>>>>>>> committing reserved memory.
>>>>>>> # An error report file with more information is saved as:
>>>>>>> # /work/shared/bugs/8013057/looper.03/hs_err_pid9111.log
>>>>>>>
>>>>>>> The hs_err_pid file will have the more verbose info:
>>>>>>>
>>>>>>> #
>>>>>>> # There is insufficient memory for the Java Runtime Environment
>>>>>>> to continue.
>>>>>>> # Native memory allocation (mmap) failed to map 4096 bytes for
>>>>>>> committing reserved memory.
>>>>>>> # Possible reasons:
>>>>>>> # The system is out of physical RAM or swap space
>>>>>>> # In 32 bit mode, the process size limit was hit
>>>>>>> # Possible solutions:
>>>>>>> # Reduce memory load on the system
>>>>>>> # Increase physical memory or swap space
>>>>>>> # Check if swap backing store is full
>>>>>>> # Use 64 bit Java on a 64 bit OS
>>>>>>> # Decrease Java heap size (-Xmx/-Xms)
>>>>>>> # Decrease number of Java threads
>>>>>>> # Decrease Java thread stack sizes (-Xss)
>>>>>>> # Set larger code cache with -XX:ReservedCodeCacheSize=
>>>>>>> # This output file may be truncated or incomplete.
>>>>>>> #
>>>>>>> # Out of Memory Error
>>>>>>> (/work/shared/bug_hunt/hsx_rt_latest/exp_8013057/src/os/s
>>>>>>> olaris/vm/os_solaris.cpp:2791), pid=9111, tid=21
>>>>>>> #
>>>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b89) (build
>>>>>>> 1.8.0-ea-b89)
>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM
>>>>>>> (25.0-b33-bh_hsx_rt_exp_8013057_dcu
>>>>>>> bed-product-fastdebug mixed mode solaris-amd64 compressed oops)
>>>>>>> # Core dump written. Default location:
>>>>>>> /work/shared/bugs/8013057/looper.03/core
>>>>>>> or core.9111
>>>>>>> #
>>>>>>>
>>>>>>> You might be wondering why we are assuming that the failed mmap()
>>>>>>> commit operation has lost the 'reserved memory' mapping.
>>>>>>>
>>>>>>> We have no good way to determine if the 'reserved memory'
>>>>>>> mapping
>>>>>>> is lost. Since all the other threads are not idle, it is
>>>>>>> possible
>>>>>>> for another thread to have 'reserved' the same memory space
>>>>>>> for a
>>>>>>> different data structure. Our thread could observe that the
>>>>>>> memory
>>>>>>> is still 'reserved' but we have no way to know that the
>>>>>>> reservation
>>>>>>> isn't ours.
>>>>>>>
>>>>>>> You might be wondering why we can't recover from this transient
>>>>>>> resource availability issue.
>>>>>>>
>>>>>>> We could retry the failed mmap() commit operation, but we would
>>>>>>> again run into the issue that we no longer know which data
>>>>>>> structure 'owns' the 'reserved' memory mapping. In
>>>>>>> particular, the
>>>>>>> memory could be reserved by native code calling mmap()
>>>>>>> directly so
>>>>>>> the VM really has no way to recover from this failure.
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130528/b9fc39ab/attachment-0001.html
More information about the hotspot-runtime-dev
mailing list