code review (round 1) for memory commit failure fix (8013057)

Tue May 28 13:08:06 PDT 2013

Stefan,

Thanks for getting back to me! Replies embedded below...

On 5/28/13 1:02 PM, Stefan Karlsson wrote:
> On 5/28/13 5:55 PM, Daniel D. Daugherty wrote:
>> Stefan,
>>
>> Thanks for the re-review! Replies embedded below.
>>
>>
>> On 5/28/13 2:56 AM, Stefan Karlsson wrote:
>>> On 05/24/2013 08:23 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I have a revised version of the proposed fix for the following bug:
>>>>
>>>>     8013057 assert(_needs_gc || 
>>>> SafepointSynchronize::is_at_safepoint())
>>>>             failed: only read at safepoint
>>>>
>>>> Here are the (round 1) webrev URLs:
>>>>
>>>> OpenJDK: http://cr.openjdk.java.net/~dcubed/8013057-webrev/1-hsx25/
>>>> Internal: 
>>>> http://javaweb.us.oracle.com/~ddaugher/8013057-webrev/1-hsx25/
>>>
>>> Your patch exits the VM if we fail to get memory in 
>>> os::commit_memory. There are a couple of places where we already 
>>> have error messages, what should we do about them? For example:
>>>   if (!os::commit_memory((char*)guard_page, _page_size, _page_size)) {
>>>     // Do better than this for Merlin
>>>     vm_exit_out_of_memory(_page_size, OOM_MMAP_ERROR, "card table 
>>> last card");
>>>   }
>>
>> Actually, my patch only exits the VM on Linux and Solaris only for
>> certain mmap() error code values so we shouldn't do anything with
>> existing code that exits on os::commit_memory() failures.
>>
>> For the specific example above, it is in platform independent code in
>> src/share/vm/memory/cardTableModRefBS.cpp so we definitely don't want
>> to remove that check. That would be bad for non-Linux and non-Solaris
>> platforms.
>
> I talked to the GC team today, and we don't want to loose the hints 
> that tell what the memory is allocated for. The code you're suggesting 
> will report the following, for the example above, on Linux/Solaris:
>
> +      warning("INFO: os::commit_memory(" PTR_FORMAT ", " SIZE_FORMAT
> +              ", " SIZE_FORMAT ", %d) failed; errno=%d", addr, size,
> +              alignment_hint, exec, err);
> +      vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
> +                            "committing reserved memory.");
>
> and for the other platforms:
>
>            vm_exit_out_of_memory(_page_size, OOM_MMAP_ERROR, "card table
>        last card");
>        
> We'd like to get the"card table last card"  string into the Linux/Solaris error message, if possible.

Well, it is software so just about anything is possible... :-)

How about a new optional parameter to os::commit_memory() where an
"alt_mesg" can be passed in? If alt_mesg != NULL (and we're on a
platform that call vm_exit_out_of_memory() the platform dependent
code), we'll use the alt_mesg instead of "committing reserved memory".
Does that sound acceptable?

Of course, I'll have to revisit and potentially update many more calls
to os::commit_memory().

>   
>>
>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/1-hsx25/src/os/linux/vm/os_linux.cpp.frames.html 
>>>
>>>  Lines 2564-2578:
>>>
>>> I don't think we want to exit immediately if we can't get large 
>>> pages. With this change JVMs will stop to run if to few large pages 
>>> have been setup. I think this will affect a lot of users.
>>>
>>> This code path should probably be handled by:
>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007074
>>
>> I suppose that depends on whether the failure to get large pages
>> causes the underlying reservation mapping to get lost.
> Yes, we'll loose the reservation but we'll get it back a couple of 
> rows bellow in os::commit_memory. That's why 8007074 wasn't found 
> immediately.

Ummmmm. As soon as the reservation is lost, then something else in
the same process can come along a mmap() that memory... It could be
a native library over which you have no control...

>
>> It also
>> depends on the errno value when large page mmap() fails.
>
> ENOMEM
>
>>
>> If you prefer, I can back out lines 2606-2621 in the following
>> function:
>>
>> 2592 bool os::pd_commit_memory(char* addr, size_t size, size_t alignment_hint,
>> 2593                        bool exec) {
>> 2594   if (UseHugeTLBFS && alignment_hint > (size_t)vm_page_size()) {
>> 2595     int prot = exec ? PROT_READ|PROT_WRITE|PROT_EXEC : PROT_READ|PROT_WRITE;
>> 2596     uintptr_t res =
>> 2597       (uintptr_t) ::mmap(addr, size, prot,
>> 2598                          MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_HUGETLB,
>> 2599                          -1, 0);
>> 2600     if (res != (uintptr_t) MAP_FAILED) {
>> 2601       if (UseNUMAInterleaving) {
>> 2602         numa_make_global(addr, size);
>> 2603       }
>> 2604       return true;
>> 2605     }
>> 2606
>> 2607     int err = errno;  // save errno from mmap() call above
>> 2608     switch (err) {
>> 2609     case EBADF:
>> 2610     case EINVAL:
>> 2611     case ENOTSUP:
>> 2612       break;
>> 2613
>> 2614     default:
>> 2615       warning("INFO: os::commit_memory(" PTR_FORMAT ", " SIZE_FORMAT
>> 2616               ", " SIZE_FORMAT ", %d) failed; errno=%d", addr, size,
>> 2617               alignment_hint, exec, err);
>> 2618       vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>> 2619                             "committing reserved memory.");
>> 2620       break;
>> 2621     }
>> 2622     // Fall through and try to use small pages
>> 2623   }
>> 2624
>> 2625   if (pd_commit_memory(addr, size, exec)) {
>> 2626     realign_memory(addr, size, alignment_hint);
>> 2627     return true;
>> 2628   }
>> 2629   return false;
>> 2630 }
>> and then I can leave that potential location for your work on 8007074.
>
> I think that would be good.

Based on your reply above that the reservation can be lost, now
I need to be convinced why you think it is OK to leave it for
now...

>
>>
>> Before I back out the code, I would be interested in exercising it
>> in a large page config on a Linux machine. Can you give me some info
>> about how to enable large pages on Linux?
>
> I have the instructions in the bug report. Contact me if that doesn't 
> work.

I thought of that after I sent my reply this morning. I'm doing
basic testing on Ron D's Linux machine right now. Large pages
will be next...

Dan

>
>>
>> I'll have to see about getting access to a Linux machine. I don't
>> have one of those in my lab. (More on that below)
>>
>>
>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/1-hsx25/src/os/linux/vm/os_linux.cpp.frames.html 
>>>
>>>  Lines 2614-2620:
>>>
>>> Do we ever end up failing like this on Linux? Have you been able to 
>>> reproduce this on Linux?
>>
>> According to this bug:
>>
>>     JDK-6843484 os::commit_memory() failures are not handled properly 
>> on linux
>> https://jbs.oracle.com/bugs/browse/JDK-6843484
>>
>> we do run into this issue on Linux. However, I have not tried my
>> reproducer on a Linux machine. I'm a bit worried about doing that
>> since I swamped my local Solaris server so bad that I had to power
>> cycle it Friday night.
>>
>> I will investigate getting access to a remote Linux machine and I'll
>> check into how they get rebooted, get unstuck, etc... I'm worried
>> about screwing up someone else's machine so I'll see about getting
>> my own...
>
> OK.
>
> StefanK
>>
>> Dan
>>
>>
>>
>>>
>>> StefanK
>>>
>>>>
>>>> Testing:
>>>> - Aurora Adhoc vm.quick batch for all OSes in the following configs:
>>>>   {Client VM, Server VM} x {fastdebug} x {-Xmixed}
>>>> - I've created a standalone Java stress test with a shell script
>>>>   wrapper that reproduces the failing code paths on my Solaris X86
>>>>   server. This test will not be integrated since running the machine
>>>>   out of swap space is very disruptive (crashes the window system,
>>>>   causes various services to exit, etc.)
>>>>
>>>> Gory details are below. As always, comments, questions and
>>>> suggestions are welome.
>>>>
>>>> Dan
>>>>
>>>>
>>>> Gory Details:
>>>>
>>>> The VirtualSpace data structure is built on top of the ReservedSpace
>>>> data structure. VirtualSpace presumes that failed os::commit_memory()
>>>> calls do not affect the underlying ReservedSpace memory mappings.
>>>> That assumption is true on MacOS X and Windows, but it is not true
>>>> on Linux or Solaris. The mmap() system call on Linux or Solaris can
>>>> lose previous mappings in the event of certain errors. On MacOS X,
>>>> the mmap() system call clearly states that previous mappings are
>>>> replaced only on success. On Windows, a different set of APIs are
>>>> used and they do not document any loss of previous mappings.
>>>>
>>>> The solution is to implement the proper failure checks in the
>>>> os::commit_memory() implementations on Linux and Solaris. On MacOS X
>>>> and Windows, no additional checks are needed.
>>>>
>>>> There is also a secondary change where some of the pd_commit_memory()
>>>> calls were calling os::commit_memory() instead of calling their 
>>>> sibling
>>>> os::pd_commit_memory(). This resulted in double NMT tracking so this
>>>> has also been fixed. There were also some incorrect mmap)() return
>>>> value checks which have been fixed.
>>>>
>>>> Just to be clear: This fix simply properly detects the "out of swap
>>>> space" condition on Linux and Solaris and causes the VM to fail in a
>>>> more orderly fashion with a message that looks like this:
>>>>
>>>> The Java process' stderr will show:
>>>>
>>>> INFO: os::commit_memory(0xfffffd7fb2522000, 4096, 4096, 0) failed; 
>>>> errno=11
>>>> #
>>>> # There is insufficient memory for the Java Runtime Environment to 
>>>> continue.
>>>> # Native memory allocation (mmap) failed to map 4096 bytes for 
>>>> committing reserved memory.
>>>> # An error report file with more information is saved as:
>>>> # /work/shared/bugs/8013057/looper.03/hs_err_pid9111.log
>>>>
>>>> The hs_err_pid file will have the more verbose info:
>>>>
>>>> #
>>>> # There is insufficient memory for the Java Runtime Environment to 
>>>> continue.
>>>> # Native memory allocation (mmap) failed to map 4096 bytes for 
>>>> committing reserved memory.
>>>> # Possible reasons:
>>>> #   The system is out of physical RAM or swap space
>>>> #   In 32 bit mode, the process size limit was hit
>>>> # Possible solutions:
>>>> #   Reduce memory load on the system
>>>> #   Increase physical memory or swap space
>>>> #   Check if swap backing store is full
>>>> #   Use 64 bit Java on a 64 bit OS
>>>> #   Decrease Java heap size (-Xmx/-Xms)
>>>> #   Decrease number of Java threads
>>>> #   Decrease Java thread stack sizes (-Xss)
>>>> #   Set larger code cache with -XX:ReservedCodeCacheSize=
>>>> # This output file may be truncated or incomplete.
>>>> #
>>>> #  Out of Memory Error 
>>>> (/work/shared/bug_hunt/hsx_rt_latest/exp_8013057/src/os/s
>>>> olaris/vm/os_solaris.cpp:2791), pid=9111, tid=21
>>>> #
>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b89) (build 
>>>> 1.8.0-ea-b89)
>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM 
>>>> (25.0-b33-bh_hsx_rt_exp_8013057_dcu
>>>> bed-product-fastdebug mixed mode solaris-amd64 compressed oops)
>>>> # Core dump written. Default location: 
>>>> /work/shared/bugs/8013057/looper.03/core
>>>> or core.9111
>>>> #
>>>>
>>>> You might be wondering why we are assuming that the failed mmap()
>>>> commit operation has lost the 'reserved memory' mapping.
>>>>
>>>>     We have no good way to determine if the 'reserved memory' mapping
>>>>     is lost. Since all the other threads are not idle, it is possible
>>>>     for another thread to have 'reserved' the same memory space for a
>>>>     different data structure. Our thread could observe that the memory
>>>>     is still 'reserved' but we have no way to know that the 
>>>> reservation
>>>>     isn't ours.
>>>>
>>>> You might be wondering why we can't recover from this transient
>>>> resource availability issue.
>>>>
>>>>     We could retry the failed mmap() commit operation, but we would
>>>>     again run into the issue that we no longer know which data
>>>>     structure 'owns' the 'reserved' memory mapping. In particular, the
>>>>     memory could be reserved by native code calling mmap() directly so
>>>>     the VM really has no way to recover from this failure.
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130528/04561eb1/attachment-0001.html