RFR(s): 8077276: allocating heap with UseLargePages and HugeTLBFS may trash existing memory mappings (linux)
Thomas Stüfe
thomas.stuefe at gmail.com
Tue May 5 12:32:19 UTC 2015
Hi Charlie,
openjdk on Linux supports two ways to explicitly allocate large pages, one
uses system V shared memory and one uses mmap. You switch between them
using UseSHM resp. UseHugeTLBFS. I think UseHugeTLBFS has preference.
Regards, Thomas
On Tue, May 5, 2015 at 2:05 PM, charlie hunt <charlie.hunt at oracle.com>
wrote:
> Hi Kirk,
>
> Thanks for chiming in! :-)
>
> I will advise our performance team to continue to recommend that THP being
> disabled. As you know, the root of the problem is not in the JVM.
>
> Perhaps we (HotSpot) might consider moving the UseTransparentHugePages
> flag to a non-product flag until THP is modified so that it can be used by
> the JVM without (severe) degradation. That might help folks resist the
> temptation to enabling THP at the OS since we are “advertising” a command
> line option that may suggest it may work well. Of course this would be
> something we would do outside of the work being reviewed here.
>
> Thomas (Stüfe): Could you remind what +UseHugeTLBFS enables?
>
> thanks,
>
> charlie
>
> > On May 5, 2015, at 4:02 AM, Kirk Pepperdine <kirk at kodewerk.com> wrote:
> >
> > Hi Stefan,
> >
> > I figured it must be a known issue but I thought I’d take the
> opportunity to add what we’re seeing out here in the wild.
> >
> > Regards,
> > Kirk
> >
> > On May 5, 2015, at 10:11 AM, Stefan Karlsson <stefan.karlsson at oracle.com>
> wrote:
> >
> >> Hi Kirk,
> >>
> >> On 2015-05-05 08:30, Kirk Pepperdine wrote:
> >>> Hi all,
> >>>
> >>> We’ve been having a running discussion in friends at jclarity.com
> regarding THP on Linux. Our recommendation (and has been for some time) is
> to turn them off as they can be responsible for some very long pause times.
> Charlie Hunt is probably the most knowledgeable person you have internally
> regarding the problem.
> >>>
> >>> Here is a GC log entry from a current discussion.
> >>>
> >>> 2015-04-03T01:26:22.488-0400: 196943.302: [GC [PSYoungGen:
> 3890145K->1557919K(5087744K)] 14069848K->12980486K(16664064K), 47.7667670
> secs] [Times: user=0.00 sys=584.96, real=47.77 secs]
> >>
> >> To be clear, the UseHugeTLBFS flag, that we discuss in this review,
> does not turn on Transparent Huge Pages.
> >>
> >> It's a known issue that THP are causing problems:
> >>
> https://blogs.oracle.com/linux/entry/performance_issues_with_transparent_huge
> >>
> >> For a short period of time we had UseTransparentHugePages turned on by
> default, but it was reverted with:
> >> https://bugs.openjdk.java.net/browse/JDK-8024838 - Significant
> slowdown due to transparent huge pages
> >>
> >> You can check if you have THP by reading this file:
> >> /sys/kernel/mm/transparent_hugepage/enabled
> >>
> >> Thanks,
> >> StefanK
> >>
> >>>
> >>> Kind regards,
> >>> Kirk
> >>>
> >>>
> >>>
> >>> On May 4, 2015, at 10:05 PM, Coleen Phillimore <
> coleen.phillimore at oracle.com> wrote:
> >>>
> >>>> Hi Thomas,
> >>>>
> >>>> I reviewed the latest version. Thanks to Stefan who reviewed the
> earlier versions and suggested changes.
> >>>>
> >>>> This looks great. I am not an expert at large pages (or even that
> knowledgeable) but your change addresses the safety concern that you had
> and the code appears to do what you said.
> >>>>
> >>>> Is there a follow-up RFE to clean out the last use of and remove the
> requested address parameter to os::reserve_memory()? This
> os::reserve_memory() calling os::pd_reserve_memory layering is really
> annoying and seems not helpful.
> >>>>
> >>>> Thanks,
> >>>> Coleen
> >>>>
> >>>> (also, I will sponsor it for you if you send me an hg exported
> version of your commit)
> >>>>
> >>>> On 4/30/15, 3:33 AM, Thomas Stüfe wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> I realize that this patch may look more complicated than it is.
> >>>>>
> >>>>> Basically, the problem is that under certain conditions, memory is
> >>>>> allocated using mmap(addr, MAP_FIXED) for an initial reservation
> (e.g. java
> >>>>> heap), which may trash existing mappings.
> >>>>>
> >>>>> The fix is basically just to remove the MAP_FIXED flag for the
> initial
> >>>>> allocation.
> >>>>>
> >>>>> Fix looks more complicated than this because the test functions were
> >>>>> expanded to add regression tests for this fix.
> >>>>>
> >>>>> Please tell me if I should dumb down this fix or add explanations.
> >>>>>
> >>>>> Thanks & Kind regards, Thomas
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Apr 30, 2015 at 8:39 AM, David Holmes <
> david.holmes at oracle.com>
> >>>>> wrote:
> >>>>>
> >>>>>> On 30/04/2015 4:59 AM, Thomas Stüfe wrote:
> >>>>>>
> >>>>>>> Could I have another reviewer, please, and a sponsor?
> >>>>>>>
> >>>>>> Latest webrev is:
> >>>>>>
> >>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.05/webrev
> >>>>>>
> >>>>>> Sorry not something I can review in depth.
> >>>>>>
> >>>>>> David
> >>>>>>
> >>>>>>
> >>>>>> Thanks!
> >>>>>>> Thomas
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thursday, April 9, 2015, Thomas Stüfe <thomas.stuefe at gmail.com>
> wrote:
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>> please review this fix to huge page allocation on Linux.
> >>>>>>>>
> >>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8077276
> >>>>>>>> webrev:
> >>>>>>>>
> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/
> >>>>>>>>
> >>>>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed() first
> establishes a
> >>>>>>>> mapping with small pages over the whole requested range, then
> exchanges
> >>>>>>>> the
> >>>>>>>> parts of the mapping which are aligned to large page size with
> large
> >>>>>>>> pages.
> >>>>>>>>
> >>>>>>>> Now, when establishing the first mapping, it uses
> os::reserve_memory()
> >>>>>>>> with a potentially non-null req_addr. In that case,
> os::reserve_memory()
> >>>>>>>> will do a mmap(MAP_FIXED).
> >>>>>>>>
> >>>>>>>> This will trash any pre-existing mappings.
> >>>>>>>>
> >>>>>>>> Note that I could not willingly reproduce the error with an
> unmodified
> >>>>>>>> VM.
> >>>>>>>> But I added a reproduction case which demonstrated that if one
> were to
> >>>>>>>> call
> >>>>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed() with a
> non-NULL
> >>>>>>>> req_addr, existing mappings would be trashed. Depending on where
> we are
> >>>>>>>> in
> >>>>>>>> address space, we also would overwrite libc structures and crash
> >>>>>>>> immediately.
> >>>>>>>>
> >>>>>>>> The repro case is part of the change, see changes in
> >>>>>>>> test_reserve_memory_special_huge_tlbfs_mixed(), and can be
> executed with:
> >>>>>>>>
> >>>>>>>> ./images/jdk/bin/java -XX:+UseLargePages -XX:+UseHugeTLBFS
> -XX:-UseSHM
> >>>>>>>> -XX:+ExecuteInternalVMTests -XX:+VerboseInternalVMTests
> >>>>>>>>
> >>>>>>>> The fix: instead of using os::reserve_memory(),
> >>>>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed() now calls
> mmap()
> >>>>>>>> directly with the non-NULL req_addr, but without MAP_FIXED. This
> means
> >>>>>>>> the
> >>>>>>>> OS may do its best to allocate at req_addr, but will not trash any
> >>>>>>>> mappings. This also follows the pattern in
> >>>>>>>> os::Linux::reserve_memory_special_huge_tlbfs_only(), which is a
> sister
> >>>>>>>> function of os::Linux::reserve_memory_special_huge_tlbfs_mixed().
> >>>>>>>>
> >>>>>>>> Note also discussion at:
> >>>>>>>>
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-April/017823.html
> >>>>>>>> .
> >>>>>>>>
> >>>>>>>> Thanks for reviewing!
> >>>>>>>>
> >>>>>>>> Kind Regards, Thomas
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>
> >
>
>
More information about the hotspot-runtime-dev
mailing list