code review (round 1) for memory commit failure fix (8013057)

Tue May 28 14:51:44 PDT 2013

On 5/28/13 11:03 PM, Daniel D. Daugherty wrote:
> On 5/28/13 2:47 PM, Stefan Karlsson wrote:
>> On 5/28/13 10:08 PM, Daniel D. Daugherty wrote:
>>> Stefan,
>>>
>>> Thanks for getting back to me! Replies embedded below...
>>>
>>>
>>> On 5/28/13 1:02 PM, Stefan Karlsson wrote:
>>>> On 5/28/13 5:55 PM, Daniel D. Daugherty wrote:
>>>>> Stefan,
>>>>>
>>>>> Thanks for the re-review! Replies embedded below.
>>>>>
>>>>>
>>>>> On 5/28/13 2:56 AM, Stefan Karlsson wrote:
>>>>>> On 05/24/2013 08:23 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I have a revised version of the proposed fix for the following bug:
>>>>>>>
>>>>>>>     8013057 assert(_needs_gc || 
>>>>>>> SafepointSynchronize::is_at_safepoint())
>>>>>>>             failed: only read at safepoint
>>>>>>>
>>>>>>> Here are the (round 1) webrev URLs:
>>>>>>>
>>>>>>> OpenJDK: http://cr.openjdk.java.net/~dcubed/8013057-webrev/1-hsx25/
>>>>>>> Internal: 
>>>>>>> http://javaweb.us.oracle.com/~ddaugher/8013057-webrev/1-hsx25/
>>>>>>
>>>>>> Your patch exits the VM if we fail to get memory in 
>>>>>> os::commit_memory. There are a couple of places where we already 
>>>>>> have error messages, what should we do about them? For example:
>>>>>>   if (!os::commit_memory((char*)guard_page, _page_size, 
>>>>>> _page_size)) {
>>>>>>     // Do better than this for Merlin
>>>>>>     vm_exit_out_of_memory(_page_size, OOM_MMAP_ERROR, "card table 
>>>>>> last card");
>>>>>>   }
>>>>>
>>>>> Actually, my patch only exits the VM on Linux and Solaris only for
>>>>> certain mmap() error code values so we shouldn't do anything with
>>>>> existing code that exits on os::commit_memory() failures.
>>>>>
>>>>> For the specific example above, it is in platform independent code in
>>>>> src/share/vm/memory/cardTableModRefBS.cpp so we definitely don't want
>>>>> to remove that check. That would be bad for non-Linux and non-Solaris
>>>>> platforms.
>>>>
>>>> I talked to the GC team today, and we don't want to loose the hints 
>>>> that tell what the memory is allocated for. The code you're 
>>>> suggesting will report the following, for the example above, on 
>>>> Linux/Solaris:
>>>>
>>>> +      warning("INFO: os::commit_memory(" PTR_FORMAT ", " SIZE_FORMAT
>>>> +              ", " SIZE_FORMAT ", %d) failed; errno=%d", addr, size,
>>>> +              alignment_hint, exec, err);
>>>> +      vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>>>> +                            "committing reserved memory.");
>>>>
>>>> and for the other platforms:
>>>>
>>>>            vm_exit_out_of_memory(_page_size, OOM_MMAP_ERROR, "card table
>>>>        last card");
>>>>        
>>>> We'd like to get the"card table last card"  string into the Linux/Solaris error message, if possible.
>>>
>>> Well, it is software so just about anything is possible... :-)
>>>
>>> How about a new optional parameter to os::commit_memory() where an
>>> "alt_mesg" can be passed in? If alt_mesg != NULL (and we're on a
>>> platform that call vm_exit_out_of_memory() the platform dependent
>>> code), we'll use the alt_mesg instead of "committing reserved memory".
>>> Does that sound acceptable?
>>>
>>> Of course, I'll have to revisit and potentially update many more calls
>>> to os::commit_memory().
>> I need to think about that. I don't think it look that nice if we 
>> duplicate that string like:
>> if (!os::commit_memory(..., "card table last card")) {
>>   vm_exit_out_of_memory(_page_size, OOM_MMAP_ERROR, "card table last 
>> card");
>> }
>
> Could solve that with a local string...
>
>
>> Maybe we could have a:
>> os::commit_memory_or_exit(..., "card table last card")
>
> So we would have both commit_memory() and commit_memory_or_exit()?
> Not liking that... Also that would require more new code on MacOS X
> and Windows that we don't currently need.
>

But now you have two vm_exit_out_of_memory calls in the path and you 
need to figure out which one is going to be called. I'm not liking that 
either.

I'm sure you could make commit_memory_or_exit a thin wrapper around 
os::commit_memory.

os::commit_memory_or_exit(..., msg) {
   if (!os::commit_memory(..., no_exit_on_OOM)) {
     vm_exit_out_of_memory(...)
   }
}

or whatever that makes the caller code a bit cleaner.

>
>>
>>>
>>>
>>>>   
>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/1-hsx25/src/os/linux/vm/os_linux.cpp.frames.html 
>>>>>>
>>>>>>  Lines 2564-2578:
>>>>>>
>>>>>> I don't think we want to exit immediately if we can't get large 
>>>>>> pages. With this change JVMs will stop to run if to few large 
>>>>>> pages have been setup. I think this will affect a lot of users.
>>>>>>
>>>>>> This code path should probably be handled by:
>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007074
>>>>>
>>>>> I suppose that depends on whether the failure to get large pages
>>>>> causes the underlying reservation mapping to get lost.
>>>> Yes, we'll loose the reservation but we'll get it back a couple of 
>>>> rows bellow in os::commit_memory. That's why 8007074 wasn't found 
>>>> immediately.
>>>
>>> Ummmmm. As soon as the reservation is lost, then something else in
>>> the same process can come along a mmap() that memory... It could be
>>> a native library over which you have no control...
>>
>> I agree about the dangers with this code. I'm just concerned that 
>> this patch will shutdown JVMs on machines that end using up all large 
>> pages or get their large pages fragmented. This will probably make 
>> the default Large Pages implementation in HotSpot unusable on Linux.
>
> Hopefully more testing will help me understand more about how things
> work on Linux. Right now, I'm having trouble just getting to the
> regular failure point let alone a large pages failure...
>
>
>>
>>>
>>>
>>>>
>>>>> It also
>>>>> depends on the errno value when large page mmap() fails.
>>>>
>>>> ENOMEM
>>>>
>>>>>
>>>>> If you prefer, I can back out lines 2606-2621 in the following
>>>>> function:
>>>>>
>>>>> 2592 bool os::pd_commit_memory(char* addr, size_t size, size_t alignment_hint,
>>>>> 2593                        bool exec) {
>>>>> 2594   if (UseHugeTLBFS && alignment_hint > (size_t)vm_page_size()) {
>>>>> 2595     int prot = exec ? PROT_READ|PROT_WRITE|PROT_EXEC : PROT_READ|PROT_WRITE;
>>>>> 2596     uintptr_t res =
>>>>> 2597       (uintptr_t) ::mmap(addr, size, prot,
>>>>> 2598                          MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_HUGETLB,
>>>>> 2599                          -1, 0);
>>>>> 2600     if (res != (uintptr_t) MAP_FAILED) {
>>>>> 2601       if (UseNUMAInterleaving) {
>>>>> 2602         numa_make_global(addr, size);
>>>>> 2603       }
>>>>> 2604       return true;
>>>>> 2605     }
>>>>> 2606
>>>>> 2607     int err = errno;  // save errno from mmap() call above
>>>>> 2608     switch (err) {
>>>>> 2609     case EBADF:
>>>>> 2610     case EINVAL:
>>>>> 2611     case ENOTSUP:
>>>>> 2612       break;
>>>>> 2613
>>>>> 2614     default:
>>>>> 2615       warning("INFO: os::commit_memory(" PTR_FORMAT ", " SIZE_FORMAT
>>>>> 2616               ", " SIZE_FORMAT ", %d) failed; errno=%d", addr, size,
>>>>> 2617               alignment_hint, exec, err);
>>>>> 2618       vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>>>>> 2619                             "committing reserved memory.");
>>>>> 2620       break;
>>>>> 2621     }
>>>>> 2622     // Fall through and try to use small pages
>>>>> 2623   }
>>>>> 2624
>>>>> 2625   if (pd_commit_memory(addr, size, exec)) {
>>>>> 2626     realign_memory(addr, size, alignment_hint);
>>>>> 2627     return true;
>>>>> 2628   }
>>>>> 2629   return false;
>>>>> 2630 }
>>>>> and then I can leave that potential location for your work on 8007074.
>>>>
>>>> I think that would be good.
>>>
>>> Based on your reply above that the reservation can be lost, now
>>> I need to be convinced why you think it is OK to leave it for
>>> now...
>>
>> I just see that we have two bad solutions at the moment: one that 
>> exits the JVM and the other that _might_ crash the JVM. I think 
>> neither is satisfactory for the users.
>
> The current situation is worse. What happens if the transient out of
> memory condition doesn't crash the JVM, it just corrupts that JVM in
> subtle ways... and the JVM just propagates the corrupted data.... :-(
If you think this is the correct way forward, could you do this as patch 
for 8007074 and handle politics around doing this?

Just to be clear. Doing the following:
$ echo 1 | sudo tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

will most likely cause all subsequent JVM invocations to fail at startup 
with this patch.

thanks,
StefanK

> Dan
>
>
>>
>> StefanK
>>>
>>>
>>>>
>>>>>
>>>>> Before I back out the code, I would be interested in exercising it
>>>>> in a large page config on a Linux machine. Can you give me some info
>>>>> about how to enable large pages on Linux?
>>>>
>>>> I have the instructions in the bug report. Contact me if that 
>>>> doesn't work.
>>>
>>> I thought of that after I sent my reply this morning. I'm doing
>>> basic testing on Ron D's Linux machine right now. Large pages
>>> will be next...
>>>
>>> Dan
>>>
>>>
>>>>
>>>>>
>>>>> I'll have to see about getting access to a Linux machine. I don't
>>>>> have one of those in my lab. (More on that below)
>>>>>
>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/1-hsx25/src/os/linux/vm/os_linux.cpp.frames.html 
>>>>>>
>>>>>>  Lines 2614-2620:
>>>>>>
>>>>>> Do we ever end up failing like this on Linux? Have you been able 
>>>>>> to reproduce this on Linux?
>>>>>
>>>>> According to this bug:
>>>>>
>>>>>     JDK-6843484 os::commit_memory() failures are not handled 
>>>>> properly on linux
>>>>> https://jbs.oracle.com/bugs/browse/JDK-6843484
>>>>>
>>>>> we do run into this issue on Linux. However, I have not tried my
>>>>> reproducer on a Linux machine. I'm a bit worried about doing that
>>>>> since I swamped my local Solaris server so bad that I had to power
>>>>> cycle it Friday night.
>>>>>
>>>>> I will investigate getting access to a remote Linux machine and I'll
>>>>> check into how they get rebooted, get unstuck, etc... I'm worried
>>>>> about screwing up someone else's machine so I'll see about getting
>>>>> my own...
>>>>
>>>> OK.
>>>>
>>>> StefanK
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> StefanK
>>>>>>
>>>>>>>
>>>>>>> Testing:
>>>>>>> - Aurora Adhoc vm.quick batch for all OSes in the following 
>>>>>>> configs:
>>>>>>>   {Client VM, Server VM} x {fastdebug} x {-Xmixed}
>>>>>>> - I've created a standalone Java stress test with a shell script
>>>>>>>   wrapper that reproduces the failing code paths on my Solaris X86
>>>>>>>   server. This test will not be integrated since running the 
>>>>>>> machine
>>>>>>>   out of swap space is very disruptive (crashes the window system,
>>>>>>>   causes various services to exit, etc.)
>>>>>>>
>>>>>>> Gory details are below. As always, comments, questions and
>>>>>>> suggestions are welome.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> Gory Details:
>>>>>>>
>>>>>>> The VirtualSpace data structure is built on top of the 
>>>>>>> ReservedSpace
>>>>>>> data structure. VirtualSpace presumes that failed 
>>>>>>> os::commit_memory()
>>>>>>> calls do not affect the underlying ReservedSpace memory mappings.
>>>>>>> That assumption is true on MacOS X and Windows, but it is not true
>>>>>>> on Linux or Solaris. The mmap() system call on Linux or Solaris can
>>>>>>> lose previous mappings in the event of certain errors. On MacOS X,
>>>>>>> the mmap() system call clearly states that previous mappings are
>>>>>>> replaced only on success. On Windows, a different set of APIs are
>>>>>>> used and they do not document any loss of previous mappings.
>>>>>>>
>>>>>>> The solution is to implement the proper failure checks in the
>>>>>>> os::commit_memory() implementations on Linux and Solaris. On 
>>>>>>> MacOS X
>>>>>>> and Windows, no additional checks are needed.
>>>>>>>
>>>>>>> There is also a secondary change where some of the 
>>>>>>> pd_commit_memory()
>>>>>>> calls were calling os::commit_memory() instead of calling their 
>>>>>>> sibling
>>>>>>> os::pd_commit_memory(). This resulted in double NMT tracking so 
>>>>>>> this
>>>>>>> has also been fixed. There were also some incorrect mmap)() return
>>>>>>> value checks which have been fixed.
>>>>>>>
>>>>>>> Just to be clear: This fix simply properly detects the "out of swap
>>>>>>> space" condition on Linux and Solaris and causes the VM to fail 
>>>>>>> in a
>>>>>>> more orderly fashion with a message that looks like this:
>>>>>>>
>>>>>>> The Java process' stderr will show:
>>>>>>>
>>>>>>> INFO: os::commit_memory(0xfffffd7fb2522000, 4096, 4096, 0) 
>>>>>>> failed; errno=11
>>>>>>> #
>>>>>>> # There is insufficient memory for the Java Runtime Environment 
>>>>>>> to continue.
>>>>>>> # Native memory allocation (mmap) failed to map 4096 bytes for 
>>>>>>> committing reserved memory.
>>>>>>> # An error report file with more information is saved as:
>>>>>>> # /work/shared/bugs/8013057/looper.03/hs_err_pid9111.log
>>>>>>>
>>>>>>> The hs_err_pid file will have the more verbose info:
>>>>>>>
>>>>>>> #
>>>>>>> # There is insufficient memory for the Java Runtime Environment 
>>>>>>> to continue.
>>>>>>> # Native memory allocation (mmap) failed to map 4096 bytes for 
>>>>>>> committing reserved memory.
>>>>>>> # Possible reasons:
>>>>>>> #   The system is out of physical RAM or swap space
>>>>>>> #   In 32 bit mode, the process size limit was hit
>>>>>>> # Possible solutions:
>>>>>>> #   Reduce memory load on the system
>>>>>>> #   Increase physical memory or swap space
>>>>>>> #   Check if swap backing store is full
>>>>>>> #   Use 64 bit Java on a 64 bit OS
>>>>>>> #   Decrease Java heap size (-Xmx/-Xms)
>>>>>>> #   Decrease number of Java threads
>>>>>>> #   Decrease Java thread stack sizes (-Xss)
>>>>>>> #   Set larger code cache with -XX:ReservedCodeCacheSize=
>>>>>>> # This output file may be truncated or incomplete.
>>>>>>> #
>>>>>>> #  Out of Memory Error 
>>>>>>> (/work/shared/bug_hunt/hsx_rt_latest/exp_8013057/src/os/s
>>>>>>> olaris/vm/os_solaris.cpp:2791), pid=9111, tid=21
>>>>>>> #
>>>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b89) (build 
>>>>>>> 1.8.0-ea-b89)
>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM 
>>>>>>> (25.0-b33-bh_hsx_rt_exp_8013057_dcu
>>>>>>> bed-product-fastdebug mixed mode solaris-amd64 compressed oops)
>>>>>>> # Core dump written. Default location: 
>>>>>>> /work/shared/bugs/8013057/looper.03/core
>>>>>>> or core.9111
>>>>>>> #
>>>>>>>
>>>>>>> You might be wondering why we are assuming that the failed mmap()
>>>>>>> commit operation has lost the 'reserved memory' mapping.
>>>>>>>
>>>>>>>     We have no good way to determine if the 'reserved memory' 
>>>>>>> mapping
>>>>>>>     is lost. Since all the other threads are not idle, it is 
>>>>>>> possible
>>>>>>>     for another thread to have 'reserved' the same memory space 
>>>>>>> for a
>>>>>>>     different data structure. Our thread could observe that the 
>>>>>>> memory
>>>>>>>     is still 'reserved' but we have no way to know that the 
>>>>>>> reservation
>>>>>>>     isn't ours.
>>>>>>>
>>>>>>> You might be wondering why we can't recover from this transient
>>>>>>> resource availability issue.
>>>>>>>
>>>>>>>     We could retry the failed mmap() commit operation, but we would
>>>>>>>     again run into the issue that we no longer know which data
>>>>>>>     structure 'owns' the 'reserved' memory mapping. In 
>>>>>>> particular, the
>>>>>>>     memory could be reserved by native code calling mmap() 
>>>>>>> directly so
>>>>>>>     the VM really has no way to recover from this failure.
>>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130528/b9fc39ab/attachment-0001.html