RFR(M): 8169373: Work around linux NPTL stack guard error.

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Tue Nov 22 11:02:09 UTC 2016


> There are four webrevs here:
> 

While working on this, I detected more and more problems. 
Because feature complete etc I kept them separate.

Webrevs for the original intended fix: NPTL error workaround.
Webrev.02 incorporates improvements from review process.
It was agreed that this is a bug fix that should go to 9.
> webrev.01/         2016-Nov-10 15:33:11    -      Directory
> webrev.02/         2016-Nov-16 14:34:26    -      Directory

Cleanup of stack handling code proposed by David.  No functional changes.
> webrev.cleanup/    2016-Nov-21 10:11:36    -      Directory

Additional changes to fix the missing space for VM guard pages, 
removing the wrong OS guard page from compiler threads, and 
bringing more sense into the handling of minimal sizes (the initial 
constant should reflect the size needed for stack frames excluding 
guard pages).
I would like to bring these to 9 still.  Else I have to do some other,
changes (increasing minimal sizes and adding the one check for compiler 
threads.)
> webrev.minStSz/    2016-Nov-16 14:39:09    -      Directory

Best regards,
  Goetz.


> 
> and I'm not sure which I should be looking at... :-)
> 
> 
> > The code is identical in all the os_cpu files.
> > (Curently it's edited on top of the other change).
> >
> > I would understand if you don't like the other changes I propose
> > For 9.  But I still think this is just how you expect this to work, the
> > minimum stack size is independent of page size and thus should
> > be specified without considering the guard pages. And therefore
> > it should go to 9 where we already changed a lot wrt. stack sizes.
> >
> > Coleen, would you mind pointing me to the bugs Dan and Ron are
> > working on?  I'd like to understand the possible conflicts.
> 
> The fix for 8140520 was not done for Win* because we (Ron and I)
> could not get reliable stack size behavior out of Win* like we
> got for the *NIX platforms. I don't remember if I've filed the
> follow up bugs for 8140520, but, in any case, I'm not working
> on thread stack size issues at the moment.
> 
> Dan
> 
> 
> >
> > Best regards,
> >    Goetz.
> >
> >> -----Original Message-----
> >> From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-
> >> bounces at openjdk.java.net] On Behalf Of David Holmes
> >> Sent: Sonntag, 20. November 2016 02:21
> >> To: Coleen Phillimore <coleen.phillimore at oracle.com>; hotspot-runtime-
> >> dev at openjdk.java.net
> >> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
> >>
> >> On 19/11/2016 12:40 AM, Coleen Phillimore wrote:
> >>>
> >>> On 11/18/16 1:34 AM, David Holmes wrote:
> >>>> Hi Goetz,
> >>>>
> >>>> On 17/11/2016 1:52 AM, Lindenmaier, Goetz wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I addressed the additional topics, but in an extra change.
> >>>>>
> >>>>> First, I rounded up the NPTL bug workaround:
> >>>>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-
> >> stackFix/webrev.02
> >>>>> I fixed os::Posix::describe_pthread_attr() and current_stack_region()
> >>>>> In the os_cpu files.
> >>>>> Interestingly, in zero,  current_stack_region() is already fixed.
> >>>> So on Linux, pthread_attr_getstack is also broken - in that it returns
> >>>> the address at the end of the guard section, instead of the usable
> stack?
> >>>>
> >>>> Can you add error checking on the pthread_attr_getguardsize call
> please.
> >>>>
> >>>> The duplication of the current_stack_region static method in the
> >>>> os-cpu files is horrible. This should be placed as a private method in
> >>>> the os class and moved to os_linux.cpp. RFE for 10.
> >>> By duplication do you mean this bit of code?
> >>>
> >>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-
> >>
> stackFix/webrev.02/src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp.udiff.html
> >>>
> >>> + // Work around NPTL stack guard error.
> >>> + size_t guard_size = 0;
> >>> + pthread_attr_getguardsize(&attr, &guard_size);
> >>> + *bottom += guard_size;
> >>> + *size -= guard_size;
> >>> +
> >>>
> >>> This should be cleaned up with this patch and not an RFE for 10.
> >> I meant the entire current_stack_region method to which this new
> >> fragment has been added.
> >>
> >> David
> >>
> >>> I don't know the NPTL issues well enough to be a reviewer for this, but
> >>> I don't like something I don't know about distributed in many places.
> >>>
> >>>>
> >>>>> Then I edited an extra webrev to fix
> >>>>>    - unnecessary OS guard page on Compiler threads
> >>>>>    - unpredictable minimal stack sizes because of varying (guard) page
> >>>>> sizes:
> >>>>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-
> >> stackFix/webrev.minStSz/
> >>>>>
> >>>>> Compiler threads now don't get OS guard pages.  Saves one page per
> >>>>> compiler thread.
> >>>>>
> >>>>> Also, the constants in the os_<os>_<cpu>.cpp files now list minimum
> >>>>> space
> >>>>> required for frames.  I basically reduced them by the size of the
> >>>>> guard zones.
> >>>>> Where minimum stack sizes are computed, the page size is known
> and
> >> the
> >>>>> minimal value is extended by the required space for the guard pages.
> >>>>> As compiler threads
> >>>>> don't bang, no space for the shadow zone is needed.
> >>>>>
> >>>>> Should I add this additional change to 8169373?  As this only deals
> >>>>> with minimal
> >>>>> stack sizes, and reduces these sizes, I don't see a relevant risk.
> >>>>> Only on systems with large pages the minimal sizes now might be
> >>>>> bigger, but
> >>>>> then the old sizes would have led to immediate stack overflows.
> >>>> Sorry this is getting a bit too divergent and disruptive to me for
> >>>> this stage of 9. You are redefining what the "minimum stack" values
> >>>> mean as well as changing their actual value. (BTW I'm sure there is a
> >>>> test of these values that would need updating.)
> >>>>
> >>>> I can't tell if it is right/wrong, necessary/unnecessary - but given
> >>>> the recent work done in this area I'm somewhat cautious about
> changing
> >>>> it again - in 9.
> >>> I think the additional changes should also wait for 10.  I like the
> >>> change to not have shadow pages for the compiler threads (not
> including
> >>> AOT compiler threads, right?), but they also collide with work that I
> >>> think Dan is going to do for windows.   We don't want the os::Posix
> >>> functions in these files.
> >>>
> >>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-
> >>
> stackFix/webrev.minStSz/src/os_cpu/linux_x86/vm/os_linux_x86.cpp.udiff.
> >> html
> >>>
> >>> Also, the work that Dan and Ron did in this area didn't change the sizes
> >>> and I'd want Dan to have time to comment on this.
> >>>
> >>> Thanks,
> >>> Coleen
> >>>
> >>>> David
> >>>>
> >>>>> Best regards,
> >>>>>    Goetz.
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
> >>>>>> Sent: Dienstag, 15. November 2016 00:54
> >>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> >> runtime-
> >>>>>> dev at openjdk.java.net
> >>>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard
> >> error.
> >>>>>> Hi Goetz,
> >>>>>>
> >>>>>> On 15/11/2016 3:12 AM, Lindenmaier, Goetz wrote:
> >>>>>>> Hi David,
> >>>>>>>
> >>>>>>>> Did you really intend to increase the real minimum stack from
> 128K
> >> to
> >>>>>>>> 256K ? (on a 64K page system)
> >>>>>>> I finally figured why I have to set such high minimum values.
> >>>>>>>
> >>>>>>> A CompilerThread is a JavaThread. It's getting yellow, red etc
> >>>>>>> pages at the top. As these are 64K on 64K systems, the compiler
> >>>>>>> thread stack has to be this big. Thus you actually need similar
> >>>>>>> values once you run a linuxx86_64 VM on a system with 64K pages.
> >>>>>>> Does this exist?
> >>>>>> No idea, sorry.
> >>>>>>
> >>>>>>> But, this means that there is no use of the OS guard pages
> >>>>>>> configured in default_guard_size().  This is called with
> >>>>>>> 'compiler_thread'
> >>>>>>> for CompilerThreads.
> >>>>>>>
> >>>>>>> We should probably also return '0' in that case.
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>> I think I agree. As you note a CompilerThread is-a JavaThread even
> >>>>>> though the ThreadType enum treats them as completely distinct:
> >>>>>>
> >>>>>>     enum ThreadType {
> >>>>>>       vm_thread,
> >>>>>>       cgc_thread,        // Concurrent GC thread
> >>>>>>       pgc_thread,        // Parallel GC thread
> >>>>>>       java_thread,
> >>>>>>       compiler_thread,
> >>>>>>       watcher_thread,
> >>>>>>       os_thread
> >>>>>>     };
> >>>>>>
> >>>>>> so anything with VM supplied guard pages doesn't need the glibc
> ones
> >> as
> >>>>>> well.
> >>>>>>
> >>>>>> I'm struggling to understand why os::Linux::default_guard_size is
> >>>>>> defined per CPU type? I would not expect this to need to vary based
> >> on
> >>>>>> CPU at all. ??
> >>>>>>
> >>>>>> Thanks,
> >>>>>> David
> >>>>>>
> >>>>>>> Best regards,
> >>>>>>>    Goetz.
> >>>>>>>
> >>>>>>> size_t os::Linux::default_guard_size(os::ThreadType thr_type) {
> >>>>>>>    // Creating guard page is very expensive. Java thread has HotSpot
> >>>>>>>    // guard page, only enable glibc guard page for non-Java threads.
> >>>>>>>    return (thr_type == java_thread ? 0 : page_size());
> >>>>>>> }
> >>>>>>>



More information about the hotspot-runtime-dev mailing list