Review request (hs24): 8007074: SIGSEGV at ParMarkBitMap::verify_clear()

Stefan Karlsson stefan.karlsson at oracle.com
Wed Aug 7 02:30:46 PDT 2013


Hi Volker,

Thanks for taking a look at this!

Inlined:

On 2013-07-16 19:21, Volker Simonis wrote:
> Hi Stefan,
>
> this is a very interesting change! Although I haven’t had a chance to
> dig trough the whole change and test it, I want to make a few comments
> beforehand:

If you do find the time to test this, please share your findings.

>
> I think this change will help to make 'UseHugeTLBFS' work on
> Linux/PPC64. The problem with the current strategy on PPC64 is that on
> PPC64 we have different memory slices (256M slices below 4G, then 4G
> slices below 1TB and finally 1TB slices above that) and each slice
> supports only one page size (I got this information from Tiago Stürmer
> Daitx from the IBM Linux Technology Center:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2013-April/000445.html).
> So with the old strategy, the first mmap(MAP_NORESERVE) ends up in a
> memory slice with small pages and the subsequent
> mmap(MAP_FIXED,MAP_HUGETLB) will fail because the reserved memory is
> already located in a slice with small pages only. So I think your
> change where the large pages are committed upfront will solve this
> problem (though I haven’t tired until now).

OK. It sounds like the current code would only work if you only allocate 
from the same slice and if LargePageSizeInBytes is setup correctly for 
that slice.

>
> Currently, Linux/PPC64 doesn't support transparent huge pages, but
> there seems to be a new implementation in the upcoming Linux Kernel
> 3.11 (see: http://lkml.indiana.edu/hypermail/linux/kernel/1307.0/01916.html).
> I'm wondering how this can work with the 'memory slicing' mentioned
> before but that's another question which I'll try to find out from the
> IBM colleagues.
>
> I've also collected all kinds of information regarding
> LargePages/mmap/etc on Linux which I'd like to put in the HotSpot
> Wiki. I'd also like to add the explanations from your mail. Would it
> be OK for you if I'd create a new page (i.e. under
> HotSpot->Runtime->LargePageSupport) where we could collect this
> information?

Sure.

>
> How did you test transparent huge page support and did you compare it
> with the old UseHugeTLBFS?

I've done performance testing on SPECjbb2005 comparing without large 
pages, with transparent huge pages and UseHugeTLBFS. In those runs 
transparent huge pages are marginally slower than UseHugeTLBFS, but 
still significantly faster than running without large pages.

> I wonder how transparent huge page support
> works in the real world, because madvise is after all only an advise.
> The result depends on the kernel settings
> (/sys/kernel/mm/transparent_hugepage/) as well as on the fact how well
> the 'khugepaged' works. If I read the transparent huge page
> documentation  (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/transhuge.txt)
> it seems to me that there are still a lot of tuning paramters. I agree
> that the general usage is easier compared to 'UseHugeTLBFS' but on the
> other hand, once UseHugeTLBFS succeeds we have the huge pages forever.

I agree. I've been told that transparent huge pages can have a huge 
overhead.

Do you think it's wrong to default to transparent huge pages when the 
user has not specified any large pages flags? The other option would be 
to not turn on large pages at all, unless the user specifies large pages 
flags. I think it's OK to default to transparent huge pages, since the 
user have that option enabled in the OS and have the ability to turn 
them off if they are too intrusive.

>
> And finally, are these changes really intended for hs24 as denoted in
> the subject?

They were intended for hs24, but we couldn't get enough testing done in 
time to include this rather large behavioral change.

Since this wasn't included in hs24, I'll try to bring this in through 
hs25 (JDK8) before, potentially, bringing this into a JDK7 update 
release. There are changes going into hs25 that conflicts with my patch, 
so i need to resolve those issues before publishing the hs25 review request.

> Then I don't understand your comment that "Unfortunately,
> it's not likely that we'll get this into the hs24 release."

That comment was meant for Florian Weimer's question about:

JDK-8012015 : Use PROT_NONE when reserving memory
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8012015


>
> Thank you and best regards,
> Volker
>
> PS: by the way, where did you get the information from that a failing
> "mmap(addr, size, ... MAP_FIXED|MAP_HUGETLB ...)" will remove the
> previous mapping? I couldn't find t hat anywhere.

I found that this was the way the linux kernel behaved while debugging 
the original crash.

I filed a bug against the Kernel:
https://bugzilla.kernel.org/show_bug.cgi?id=57951

Unfortunately, this seems to be the intended behavior.

>   (I think this may
> also be one of the reasons why we sometimes  loose the guard page
> protection for a thread. We thought we fixed that problem ("7107135 :
> Stack guard pages are no more protected after loading a shared library
> with executable stack",
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7107135) but maybe
> we only fixed one reason? What if a thread places it's stack into the
> area of the removed mapping and afterwards, when the memory is mmaped
> with small pages, the guard pages of the thread will become read/write
> permissions.)

Sounds reasonable. Also, note that a failing mmap(... MAP_FIXED ...) 
with small pages will loose the reservation. The following change tries 
to detect that and gracefully shut down the JVM:
http://hg.openjdk.java.net/jdk7u/jdk7u40/hotspot/rev/a1a295252814

thanks,
StefanK

>
>
> On Tue, Jul 2, 2013 at 6:57 PM, Stefan Karlsson
> <stefan.karlsson at oracle.com> wrote:
>> http://cr.openjdk.java.net/~stefank/8007074/webrev.00/
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007074
>>
>> The default way of using Large Pages in HotSpot on Linux (UseHugeTLBFS) is
>> broken. This is causing a number of crashes in different subsystems of the
>> JVM.
>>
>>
>> Bug Description
>> ===============
>>
>> The main reason for this bug is that mmap(addr, size, ...
>> MAP_FIXED|MAP_HUGETLB ...) will remove the previous mapping at [addr,
>> addr+size) when we run out of large pages on Linux.
>>
>> This affects different parts of the JVM, but the most obvious is the
>> allocation of the Java heap:
>>
>> When the JVM starts it reserves a memory area for the entire Java heap. We
>> use mmap(...MAP_NORESERVE...) to reserve a contiguous chunk of memory that
>> no other
>> subsystem of the JVM, or Java program, will be allowed to mmap into.
>>
>> The reservation of the memory only reflects the maximum possible heap size,
>> but often a smaller heap size is used if the memory pressure is low. The
>> part of
>> the heap that is actually used is committed with mmap(...MAP_FIXED...). When
>> the heap is growing we commit a consecutive chunk of memory after the
>> previously committed memory. We rely on the fact that no other thread will
>> mmap into the reserved memory area for the Java heap.
>>
>> The actual committing of the memory is done by first trying to allocate
>> large pages with mmap(...MAP_FIXED|MAP_HUGETLB...), and if that fails we
>> call mmap with the same parameters but without the large pages flag
>> (MAP_HUGETLB).
>>
>> Just after we have failed to mmap large pages and before the small pages
>> have been mmapped, there's an unmapped memory region in the middle of the
>> Java heap, where other threads might mmap into. When that happens we get
>> memory trashing and crashes.
>>
>>
>> Large Pages in HotSpot - on Linux
>> =================================
>>
>> Currently, before the bug fix, HotSpot supports three ways of allocating
>> large pages on Linux.
>> 1) -XX:+UseSHM - Commits the large pages upfront when the memory is
>> reserved.
>>
>> 2) -XX:+UseHugeTLBFS - This is the broken implementation. It's also the
>> default way large pages are allocated. If the OS is correctly configured, we
>> get these kind of large pages for three different reasons:
>> 2.1) The user has not specified any large pages flags
>> 2.2) The user has specified -XX:+UseLargePages
>> 2.3) The user has specified -XX:+UseHugeTLBFS
>>
>> 3) Transparent Huge Pages - is supported on recent Linux Kernels. The user
>> can choose to configure the OS to:
>> 3.1) completely handle the allocation of large pages, or
>> 3.2) let the JVM advise where it would be good to allocate large pages.
>> There exist code for this today, that is guarded by the (2)
>> -XX:+UseHugeTLBFS flag.
>>
>>
>> The Proposed Patch
>> ==================
>>
>> 4) Create a new flag -XX:+UseTransparentHugePages, and move the transparent
>> huge pages advise in (3.2) out from the (2) -XX:+UseHugeTLBFS code.
>>
>> 5) Make -XX:+UseTransparentHugePages the default way to allocate large pages
>> if the OS supports them. It will be the only kind of large pages we'll use
>> if the user has not specified any large pages flags.
>>
>> 6) Change the order of how we choose the kind of large pages when
>> -XX:+UseLargePages has been specified. It used to be UseHugeTLBFS then
>> UseSHM, now it's UseTransparentHugePages, then UseHugeTLBFS, then UseSHM.
>>
>> 7) Implement a workaround fix for the (2) -XX:+UseHugeTLBFS implementation.
>> With the fix the large pages are committed upfront when they are reserved.
>> It's mostly the same way we do it for the older (1) -XX:+UseSHM large pages.
>> This change will fix the bug, but has a couple of drawbacks:
>> 7.1) We have to allocate the entire large pages memory area when it is
>> reserved instead of when parts of it are committed.
>> 7.2) We can't dynamically shrink or grow the used memory in the large pages
>> areas.
>> If these restrictions are not suitable for the user, then (3)
>> -XX:+UseTransparentHugePages could be used instead.
>>
>> 8) Ignore -XX:LargePageSizeInBytes on Linux since the OS doesn't support
>> multiple large page sizes and both the old code and new code is broken if
>> the user is allowed to set it to some other value then the OS chosen value.
>> Warn if the user specifies a value different than the OS default value.
>>
>>
>> Testing
>> =======
>>
>> New unit tests have been added. These can be run in a non-product build
>> with:
>> java -XX:+ExecuteInternalVMTests -XX:+VerboseInternalVMTests <large pages
>> flags> -version
>>
>> unit tests: with and without large pages on Linux, Windows, Solaris, x86,
>> x64, sparcv9.
>> jprt: default
>> jprt: -XX:+UseLargePages
>> jprt: -XX:+UseLargePages -XX:-UseCompressedOops
>> vm.quick.testlist, vm.pcl.testlist, vm.gc.testlist: multiple platforms, with
>> large pages on all major GCs with and without compressed oops.
>> SPECjbb2005 performance runs: on Linux x64 with -XX:+UseHugeTLBFS before and
>> after the patch.
>> Kitchensink: 3 days on Linux x64
>>
>>
>> thanks,
>> StefanK



More information about the hotspot-dev mailing list