RFR(s): 8212173: Thread._stack_base/_stack_size initialized too late for new threads

Thomas Stüfe thomas.stuefe at gmail.com
Tue Oct 23 06:42:16 UTC 2018


Thank you David!

This is unfortunately unsolved still for Windows/Solaris/AIX, so one
should not iterate the thread list and query the stack size.

I wonder whether we should query the OSThread::state and only return
threads whose state is >= INITIALIZED. But that is for a different
patch.

Thanks, Thomas



On Tue, Oct 23, 2018 at 4:07 AM David Holmes <david.holmes at oracle.com> wrote:
>
> Hi Thomas,
>
> This looks good to me - thanks. Setting the stack base and size first up
> is more robust all round and as you note happens before the adding to
> the threads-list in some cases.
>
> Getting rid of the solaris-specific os::initialize_thread is a good
> clean up!
>
> I've run this through our internal tier 1-3 testing again and it all passed.
>
> Thanks,
> David
>
> On 23/10/2018 2:56 AM, Thomas Stüfe wrote:
> > Dear all,
> >
> > may I have please reviews for the following fix:
> >
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8212173
> > cr: http://cr.openjdk.java.net/~stuefe/webrevs/8212173-thread-stack-and-base-initialized-too-late/webrev.02/webrev/
> >
> > See also prior mail discussions: [1]
> >
> > Wwhen iterating the thread list, threads may be visible which have
> > stack base/size not yet set. Calling stack_base() will yield NULL,
> > stack_size() 0 or it will assert. The reason for this is that there
> > exist a race: Thread::_stack_base and ::_stack_size are initialized
> > too late, when the Thread has already been added to the threads list.
> >
> > My solution to this problem is basically to move the initialization of
> > Thread::_stack_base and ::_stack_size to an earlier point at which the
> > thread has not yet been added to the threads list.
> >
> > Unfortunately, this is more difficult that it sounds. Retrieving
> > thread stack base and size needs to be done from within the thread in
> > question, and only on Linux and BSD we have the opportunity to execute
> > code in the new child thread before its Thread object is added to the
> > thread list. So, my patch solves the problem completely only on Linux
> > and BSD. On Windows, Aix and Solaris it only minimizes the chance for
> > it to happen.
> >
> > For more details, please see bug description and the discussion in [1].
> >
> > ----
> >
> > My changes in detail:
> >
> > - Thread::record_stack_base_and_size(): I removed all code which needs
> > the surrounding Thread object being initialized, or which needs
> > Thread::current() to be set. That makes the function safe to be called
> > from the very start of the child thread's life (start of
> > thread_native_entry()), where nothing is initialized yet.
> >
> > - I moved the call to Thread::record_stack_base_and_size() out of all
> > <ChildThreadClass>::run() functions into the very start of the
> > thread_native_entry() on every platform.
> >
> > - The parts I moved out of Thread::record_stack_base_and_size() are:
> > NMT thread stack initialization and Logging. I needed to find a new
> > place for them. I wanted them to run at the end of the child thread
> > initialization, when the child Thread is already initialized and
> > Thread::current is initialized too. So I moved them to just before
> > <ChildThreadCLass>::run() is called.
> >
> > -  To prevent code duplication, I changed:
> >
> > public virtual Thread::run()
> >
> > to
> >
> > public (nonvirtual) Thread::call_run();
> > protected virtual Thread::run() = 0;
> >
> > Thread::call_run() now is the place to put common,
> > platform-independent and <ChildThreadClass>-independent
> > initializations. There I put the NMT stuff and logging. After
> > <ChildThreadClass>::run() returns, I put common cleanup checks.
> >
> > Other changes:
> >
> > - We had a function os::initialize_thread() which only existed for the
> > sole benefit of Solaris, where if gives the platform a stab at
> > correcting the stack base, size for a certain corner case when we run
> > on primordial thread. On all other platform-cpu-combinations, this
> > function was empty. I removed all empty implementations and renamed
> > the function to the much more fitting, Solaris only
> > "os::Solaris::correct_stack_boundaries_for_primordial_thread(Thread*
> > thr)".
> >
> > - When <ChildClassThread>::run() returns, the Thread object may have
> > been deleted by the child class run() function. To avoid errors I set
> > the thread pointer to zero (where possible) to prevent accessing it to
> > reference members.
> >
> > - Since Thread::record_stack_base_and_size() does not register the
> > thread stack with NMT anymore, I needed to fix those places where the
> > former is called for attached threads and add that NMT registration
> > manually.
> >
> > - I made Thread::clear_thread_current() a static function since it
> > needs no members of Thread (unlike Thread::initialize_thread_current()
> > which needs the this pointer to hook it into TLS). That makes it
> > possible to use clear_thread_current() without having a valid Thread
> > object, which is the case if <ChildClassThread>::run() deleted the
> > Thread.
> >
> > ----
> >
> > That's about it.
> >
> > The change ran through jdk-submit tests without errors.
> >
> > Thank you for reviewing.
> >
> > ..Thomas
> >
> > [1] http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2018-October/030516.html
> >


More information about the hotspot-runtime-dev mailing list