Question about thread initialization

Thomas Stüfe thomas.stuefe at gmail.com
Thu Oct 18 07:52:50 UTC 2018


Hi David,
On Thu, Oct 18, 2018 at 3:29 AM David Holmes <david.holmes at oracle.com> wrote:
>
> Hi Thomas,
>
> On 17/10/2018 10:33 PM, Thomas Stüfe wrote:
> > This proves to be more difficult than I initially thought :/
> >
> > The problem is that we need to initialize stack base, size from within
> > the newly started thread itself: for stack_base, this is obvious. And
> > even stack_size - which should usually be equal to the requested stack
> > size known to the parent thread - might be larger, actually.
> >
> > But actually, on some platforms the thread-start-handshake is
> > differently implemented than on Linux. On Linux, native_thread_entry()
> > starts runnning right away after pthread_create(), and then waits for
> > the handshake. On Solaris/AIX/Windows we simply create the new thread
> > in suspended state, and in os::start_thread(), we resume it. So
> > native_thread_entry() never ran.
>
> Hmmm my (incorrect) recollection was that the "start suspended" ability
> only handled the first part of the handshake, I didn't realize it
> removed it all. I also didn't recall it being used other than on Solaris
> - but I've never looked at the AIX code (I assumed it would be a copy of
> Linux).
>

I think I wrote that part for AIX but cannot recollect anymore why we
do it the solaris/windows way instead of the linux way. Maybe it just
seemed more natural. Wild guess, the Linux handshake code feels like a
workaround from pre-nptl times, maybe because threads could not be
started in a suspended state?

> > Now we have a hen-end-egg problem: we want to call Threads::add()
> > before os::start_thread() (I suppose, since the Thread may be short
> > lived and we do not want to add a possibly dead thread to the Threads
> > list). But we do not have the stack dimensions, since the thread on
> > these platforms never ran and hence had no opportunity to initialize
> > Thread::_stack_base/size.
>
> Yes there is an unavoidable window where the thread is visible to
> "iterators" but has not yet initialized itself to the point where it is
> necessarily safe to interact with. Safepoint based iteration is not an
> issue, but we also seem to have a number of non-safepoint-based
> iterations occurring.
>
> But if we can initialise the stack base and size as the first thing in
> thread_native_entry, that should at least make the window as small as
> possible - no? And also handle the crash reporting case.

I think this is what I will do. That will mitigate the problem
somewhat for solaris, windows and solve it completely for Linux and
Mac.

>
> > Not really sure what to do here: I guess one could replace the resume
> > mechanism on Solaris/Windows/AIx with the type of monitor-based
> > handshake we do on Linux. But that seems heavy handed and may also
> > make thread creation slower.
>
> Other than Windows that would be the result if we standardized on
> POSIX-based threading. But that's not an insignificant effort in itself,
> and the performance implications would still need examining.
>

Yes, this looks like time consuming work, and I do not have much time
left on this issue. In general I wonder whether there would be really
any performance implications with the handshake technique - and if
there are, why we use it then on Linux instead of cresting the thread
in suspended state (I know, it is probably all historical).

> BTW by chance this bug seems to show code that is both susceptible to
> finding new threads too soon, and also not preventing threads from
> terminating whilst in use:
>
> https://bugs.openjdk.java.net/browse/JDK-8212207
>

Seems similar, yes. Wonder why we crash though. Should we not simply
return EINVAL if the pthread id is invalid?

..Thomas

> Cheers,
> David
>
> > Cheers, Thomas
> >
> > On Mon, Oct 15, 2018 at 9:58 PM David Holmes <david.holmes at oracle.com> wrote:
> >>
> >> Hi Thomas,
> >>
> >> I'm not sure I see a reason for splitting. Just move
> >> record_stack_base_and_size after the essential initialization in
> >> thread_native_entry - probably just before the:
> >>
> >> log_info(os, thread)("Thread is alive (tid: ...
> >>
> >> as long as it happens before the handshake to the creating thread it
> >> should fix the problem of being able to see the new thread before its
> >> stack is set. The early crash problem is improved even if not perfect
> >> and better handled in the error reporter IMHO - as it needs to be
> >> resilient anyway.
> >>
> >> Cheers,
> >> David
> >>
> >> On 16/10/2018 1:41 AM, Thomas Stüfe wrote:
> >>> Hi David,
> >>>
> >>> On Mon, Oct 15, 2018 at 7:32 AM David Holmes <david.holmes at oracle.com> wrote:
> >>>>
> >>>> On 13/10/2018 5:49 PM, Thomas Stüfe wrote:
> >>>>> On Sat, Oct 13, 2018 at 8:36 AM Thomas Stüfe <thomas.stuefe at gmail.com> wrote:
> >>>>>>
> >>>>>> Hi David,
> >>>>>>
> >>>>>> On Sat, Oct 13, 2018 at 1:42 AM David Holmes <david.holmes at oracle.com> wrote:
> >>>>>>>
> >>>>>>> Hi Thomas,
> >>>>>>>
> >>>>>>> On 13/10/2018 3:56 AM, Thomas Stüfe wrote:
> >>>>>>>> Hi  David,
> >>>>>>>>
> >>>>>>>> thank you for your brain cycles.
> >>>>>>>>
> >>>>>>>> On Fri, Oct 12, 2018 at 12:39 PM David Holmes <david.holmes at oracle.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Thomas,
> >>>>>>>>>
> >>>>>>>>> On 12/10/2018 6:18 PM, Thomas Stüfe wrote:
> >>>>>>>>>> Hi all,
> >>>>>>>>>>
> >>>>>>>>>> a small question.
> >>>>>>>>>>
> >>>>>>>>>> JVM_StartThread calls new JavaThread()
> >>>>>>>>>>        JavaThread::JavaThread() calls os::create_thread()
> >>>>>>>>>>          os::create_thread() starts the new thread and waits for the
> >>>>>>>>>> handshake, then returns
> >>>>>>>>>>
> >>>>>>>>>> Back in JVM_StartThread, we call JavaThread::prepare(), which adds the
> >>>>>>>>>> new thread to the Threads list. By that time, the new thread is
> >>>>>>>>>> already running, but how far it has gotten is unknown.
> >>>>>>>>>
> >>>>>>>>> Right - though the new thread can't enter run() until after it has been
> >>>>>>>>> prepared and "started". There are two parts to the handshake:
> >>>>>>>>>
> >>>>>>>>>          Parent thread             New Thread
> >>>>>>>>>         start new Thread
> >>>>>>>>>         wait for new thread
> >>>>>>>>>                                   signal parent
> >>>>>>>>>                                   wait for parent
> >>>>>>>>>         prepare new thread
> >>>>>>>>>         "start" new thread
> >>>>>>>>>         signal new thread
> >>>>>>>>>                                    run()
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Ah, I see. The new thread is taken off the leash only at the end of
> >>>>>>>> JVM_StartThread(), when Thread::start() is called and the final part
> >>>>>>>> of the handshake is completed.
> >>>>>>>>
> >>>>>>>>>> The new thread's stack dimensions are set from within Thread::run()
> >>>>>>>>>> (for some reason, every child class does this on its own?) by calling
> >>>>>>>>>> Thread::record_stack_base_and_size(). So, after the handshake with its
> >>>>>>>>>> parent thread. Why?
> >>>>>>>>>
> >>>>>>>>> Good question. Undoubtedly historical. :)
> >>>>>>>>>
> >>>>>>>>>> This means we have a little race: in the Threads list there may be
> >>>>>>>>>> threads which have been just created and Thread::run() did not yet get
> >>>>>>>>>> around to set the stack size. In tests I stumbled over this, very
> >>>>>>>>>> rarely, when iterating Threads to check the stack sizes.
> >>>>>>>>>
> >>>>>>>>> Hmmm. Threads that are still _thread_new should really be invisible
> >>>>>>>>> until more fully initialized. How exactly were you iterating the
> >>>>>>>>> threads? This might be an oversight in the related APIs.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> This was because of a little unrelated test I wrote to accumulate the
> >>>>>>>> combined thread stack sizes. They did not add up and I was confused.
> >>>>>>>>
> >>>>>>>> I am now less confused:
> >>>>>>>>
> >>>>>>>> In  JVM_StartThread():
> >>>>>>>>
> >>>>>>>> {
> >>>>>>>>       grab thread list lock
> >>>>>>>>       new JavaThread()
> >>>>>>>>          - calls pthread_create, new thread starts and waits for
> >>>>>>>> handshake. stack base, size still NULL,0.
> >>>>>>>>       JavaThread::prepare()
> >>>>>>>>          - thread gets added to Threads list
> >>>>>>>> a } // relinquish threads lock
> >>>>>>>> ...
> >>>>>>>> ...
> >>>>>>>> Threads::start()
> >>>>>>>>       - signal new thread to run
> >>>>>>>> b  - new thread completes handshake, calls Thread::run(), and all
> >>>>>>>> implementations pretty much immediately set the stackbase/size.
> >>>>>>>>
> >>>>>>>> Between (a) and (b) another thread could grab the Threads lock,
> >>>>>>>> iterate the threads and would see the new thread with uninitialized
> >>>>>>>> base/size.
> >>>>>>>>
> >>>>>>>> To prove this I did a little test right there, in JVM_StartThread, after (a):
> >>>>>>>>
> >>>>>>>> +  {
> >>>>>>>> +    MutexLocker mu(Threads_lock);
> >>>>>>>> +    MyThreadClosure tc;
> >>>>>>>> +    Threads::threads_do(&tc);
> >>>>>>>> +  }
> >>>>>>>>
> >>>>>>>> the thread closure just prints the stack dimensions of the thrad. Sure
> >>>>>>>> enough, the new thread still has NULL/0 for stack base/size. In fact,
> >>>>>>>> Thread::stack_base() asserts because of this.
> >>>>>>>>
> >>>>>>>> So, could not another thread happen to do the same in this little time interval?
> >>>>>>>
> >>>>>>> There _should_ be some rules about how thread closures gather their
> >>>>>>> target threads based on the thread state. Part of that should (now that
> >>>>>>> I think about it) filter out threads that are still "new" - but whether
> >>>>>>> they do or not is a different matter. This may well just be an oversight.
> >>>>>>>
> >>>>>>
> >>>>>> I think so too.
> >>>>>>
> >>>>>>>>>> Is there any reason why we could not just call
> >>>>>>>>>> record_stack_base_and_size() before calling Thread::run(), right at
> >>>>>>>>>> the start of the native entry function (thread_native_entry in the
> >>>>>>>>>> case of Linux)?
> >>>>>>>>>
> >>>>>>>>> Have you tried it? :)
> >>>>>>>>>
> >>>>>>>>> I can't immediately see why this can't happen in the
> >>>>>>>>> thread_native_entry. It's possible there was once a dependency with
> >>>>>>>>> something in thread preparation etc, but that's just speculation on my part.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Funny, I wanted to test it and then it occurred to me that we do this
> >>>>>>>> all along already on AIX: At the entrace of thread_native_entry I set
> >>>>>>>> stack base and size (that was even my own code from the initial AIX
> >>>>>>>> port). Unfortunately I have lost the history for that change and do
> >>>>>>>> not know anymore why I did this.
> >>>>>>>>
> >>>>>>>> Since we never had problems on AIX I guess this is okay for other
> >>>>>>>> platforms too - as long os::current_stack_base() /
> >>>>>>>> current_stack_size() have no side effects and do not rely on anything
> >>>>>>>> but OS functionality.
> >>>>>>>
> >>>>>>> Okay so are you going to propose making such a change?
> >>>>>>>
> >>>>>>
> >>>>>> Yes. I'll prepare a webrev.
> >>>>>>
> >>>>>
> >>>>> Yikes Thread::record_stack_base_and_size() is complex. That thing's
> >>>>> name is a blatant lie, it does way more than that, especially on
> >>>>> Solaris.
> >>>>
> >>>> To be fair the problem is in the naming of os::initialize_thread() and
> >>>> the fact that on Solaris it has also been used for additional thread
> >>>> initialization when only intended for fixing up issues with the
> >>>> primordial thread's stack. And it seems only an issue with Solaris as
> >>>> everywhere else it is a empty method (and I'm unclear how it ended up as
> >>>> an <os>_<cpu>.cpp function!)
> >>>>
> >>>
> >>> Okay maybe I overreacted. The function name was probably well choosen
> >>> some time in the past and then the function grew in scope.
> >>>
> >>>> I would think you can move record_stack_base_and_size() in
> >>>> thread_native_entry; delete initialize_thread(Thread t) completely. Put
> >>>> the code that was in the Solaris initialize_thread into the Solaris
> >>>> naitve_thread_entry after record_stack_base_and_size().
> >>>>
> >>>
> >>> Replacing os::initialize_thread() with a Solaris-only local solution
> >>> seems clean and very reasonable.
> >>>
> >>> I am not so sure of moving the call to
> >>> Thread::record_stack_base_and_size() up to thread_native_entry, not in
> >>> its current form.
> >>>
> >>> I worry about this part:
> >>>
> >>> #if INCLUDE_NMT
> >>>     // record thread's native stack, stack grows downward
> >>>     MemTracker::record_thread_stack(stack_end(), stack_size());
> >>> #endif // INCLUDE_NMT
> >>>     log_debug(os, thread)("Thread " UINTX_FORMAT " stack dimensions: "
> >>>       PTR_FORMAT "-" PTR_FORMAT " (" SIZE_FORMAT "k).",
> >>>       os::current_thread_id(), p2i(stack_base() - stack_size()),
> >>>       p2i(stack_base()), stack_size()/1024);
> >>>
> >>> Both NMT recording and logging are probably sensitive to the current
> >>> Thread not fully initialized or Thread::current() not set at all
> >>> (depending on how early we want to set stack_base and size). So I
> >>> still prefer to chop this function in two, move the stack_base and
> >>> size part up to the start of native_thread_entry, and leaving the rest
> >>> where it is, under a new name.
> >>>
> >>>>> I feel unsure about moving this up to the start of the native entry
> >>>>> function, since it is not aware of its surrounding Thread class being
> >>>>> only half initialized.
> >>>>>
> >>>>> On AIX, we just call "set_stack_base" and "set_stack_size" directly,
> >>>>> and later, in shared code, run thru
> >>>>> Thread::record_stack_base_and_size() like everyone else. So, we just
> >>>>> pre-initialized base and size. I do not want to do this for all
> >>>>> platforms, since this is ugly.
> >>>>>
> >>>>> I wonder whether a better solution would be to change
> >>>>> Thread::record_stack_base_and_size():
> >>>>>
> >>>>> 1 move the calls to set_stack_base() and set_stack_size() to the OS
> >>>>> specific thread_native_entry(). We can call this right away in the
> >>>>> newly born thread, since it relies on nothing else.
> >>>>>
> >>>>> 2 leave the rest in place and rename the function to something like
> >>>>> "Thread::finish_initialization()"
> >>>>>
> >>>>> 3 move the call to Thread::finish_initialization() up out of the
> >>>>> <XXXThread>::run() functions to thread_native_entry() - just before it
> >>>>> calls Thread::run().
> >>>>>
> >>>>> 4 Alternative: to (3), make Thread::run() a non-virtual public
> >>>>> function - a place for os-generic and thread-class-generic common
> >>>>> stuff to-be-run-before-run(). Add a second, protected method, virtual
> >>>>> Thread::do_run() = 0, which is the real function child classes should
> >>>>> override.
> >>>>>
> >>>>> BTW I remember now why I did move the stack base/size initialization
> >>>>> on AIX up to the start of thread_native_entry. We had crashes in newly
> >>>>> spawned child threads and no usable hs-err files since to print a
> >>>>> stack trace you need to know the stack dimensions. Another reason to
> >>>>> change this.
> >>>>
> >>>> Well ... a thread should be doing very little interesting in terms of
> >>>> crashes prior to calling run() and setting up the stack information, and
> >>>> there will always be somewhere it can crash before this is set up, so
> >>>> perhaps the error reporter also needs to be bit more cautious about
> >>>> assuming things about the thread initialization state and the ability to
> >>>> use the Thread API.
> >>>>
> >>>
> >>> I agree. I think on AIX we are now able to print a stack trace without
> >>> knowing the stack dimensions from Thread.
> >>>
> >>> Best Regards, Thomas
> >>>
> >>>> Cheers,
> >>>> David
> >>>>
> >>>>> ..Thomas
> >>>>>
> >>>>>
> >>>>>> Cheers, Thomas
> >>>>>>
> >>>>>>> Cheers,
> >>>>>>> David
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Cheers, Thomas
> >>>>>>>>
> >>>>>>>>>> Am I missing something here?
> >>>>>>>>>
> >>>>>>>>> I guess we will find out :)
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> David
> >>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>>
> >>>>>>>>>> Thomas
> >>>>>>>>>>


More information about the hotspot-runtime-dev mailing list