openjdk9 on Freebsd

Mon Mar 21 00:45:12 UTC 2016

A further followup/clarification ...

On 21/03/2016 8:20 AM, David Holmes wrote:
> Sorry I didn't seem this before my previous reply ...
>
> On 21/03/2016 4:53 AM, Brian Gardner wrote:
>> I looked into the origination of other calls to set_thread(NULL) in
>> openjdk8.
>>
>>   *
>> hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepThread.cpp:163:
>>
>>     ThreadLocalStorage::set_thread(NULL);
>>   *
>> hotspot/src/share/vm/gc_implementation/shared/concurrentGCThread.cpp:91:
>>     ThreadLocalStorage::set_thread(NULL);
>>   * hotspot/src/share/vm/runtime/thread.cpp:1351:
>>     ThreadLocalStorage::set_thread(NULL);
>>
>> These are use cases where set_thread(NULL) was called roughly towards
>> the end of the threads run() method.  Annotating them showed they’ve
>> been around prior to being open sourced.  At some point in time there
>> was an effort made to ensure set_thread(NULL) was always called on all
>> threads, and there is a degradation in openjdk9 in these regards.
>
> Any thread that terminates explicitly will do this as part of the Thread
> destructor. For other threads it didn't seem necessary given they would
> terminate only when the process terminated.

Some non-JavaThreads can actually terminate but AFAICS fail to ever 
destroy the associated Thread instance. That would seem to be a bug in 
itself. If we did ensure the destructor was called on these threads then 
that was also deal with the removal of the set_thread(NULL). If the 
threads do not terminate however, and are still running when the VM 
terminates, then there is no place to put the "missing" 
set_thread(NULL). Note this applies to Java threads to - any Java thread 
still running at VM termination never has set_thread(NULL) called.

So I'm still somewhat at a loss to understand when a "missing" 
set_thread(NULL) can cause a problem at VM termination?

Thanks,
David

>> If cleaning things up in the Thread destructor (Thread::~Thread) worked
>> on all platforms, it would be a clean way to handle the cleanup.  But
>> since this doesn’t get called on all platforms, this logic should have
>> been moved to JavaThread::run().  WatcherThread::run initializes it’s
>> TLS and cleans it up, why not have JavaThread::run() do the same?  I
>> don’t like JavaThread::run leaving it up to all it’s implementations to
>> clean themselves up.
>
> This is not a platform specific issue. Thread::~Thread gets called, or
> not, for the same set of threads on all platforms. The VMThread, nor
> those other threads, is not a JavaThread so JavaThread::run does not
> come into it. And for JavaThreads we don't want to do this cleanup in
> JavaThread::run because we may still need to refer to the current
> thread, so it is done during Thread::~Thread when the thread has
> "terminated" as far as the VM is concerned.
>
> The problem at hand only affects threads that never terminate/detach.
> They previously would call set_thread(NULL) (though why is unclear), but
> after my changes they don't. But that can be rectified and as I said I
> will file a bug to handle that.
>
> Please note however I am heading out on vacation in two days so this is
> unlikely to get fixed until I return in a couple of weeks.
>
> Thanks,
> David
>
>> Brian Gardner
>>
>>
>>> On Mar 19, 2016, at 11:46 PM, Brian Gardner <openjdk at getsnappy.com
>>> <mailto:openjdk at getsnappy.com>> wrote:
>>>
>>> Yes, I was referring to the call to
>>> ThreadLocalStorage::set_thread(NULL).
>>>
>>> The problem I'm trying to resolve are the pthread destructors being
>>> called repeadly.  On linux this isn’t a problem because the destructor
>>> is called 4 times then silently gives up.  On FreeBSD the destructor
>>> is called 4 times then prints a warning to stderr, which is a problem,
>>> although it is harmless.
>>>
>>> The message below from the original thread states there are three
>>> scenarios threads fall into in regards to the initial commit. The
>>> third scenario is the problem scenario I just mentioned and while it
>>> is ok on Linux, it isn’t ok on Freebsd because of the warnings to
>>> stderr.
>>>
>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010796.html
>>>
>>> I didn’t articulate it very well but I was trying to say that calling
>>> ThreadLocalStorage::set_thread(NULL) durning thread cleanup is the
>>> proper way to prevent the destructor from being called repeatedly.  I
>>> actually can’t think of an alternate way to do this.
>>> In openjdk8 all threads clean themselves by calling
>>> ThreadLocalStorage::set_thread(NULL). I’m going to do some more
>>> research to locate the change sets for these calls. Perhaps they are
>>> isolated to bsd-port branch. I’ll let everyone know what I find.
>>>
>>> Kind regards,
>>> Brian Gardner
>>>
>>>
>>>> On Mar 18, 2016, at 2:57 PM, David Holmes <david.holmes at oracle.com
>>>> <mailto:david.holmes at oracle.com>> wrote:
>>>>
>>>> On 19/03/2016 2:58 AM, Brian Gardner wrote:
>>>>> That explains the destructor.  Looking at the initial change set that
>>>>> came out of this bug, we also see the first spot where we set the TLS
>>>>> current thread is set to NULL
>>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494
>>>>
>>>> Are you referring to:
>>>>
>>>> +  // Thread destructor usually does this.
>>>> +  ThreadLocalStorage::set_thread(NULL);
>>>>
>>>> ? That's only there because the VMThread destructor is never called.
>>>>
>>>>> So I think it’s safe to say that setting TLS current thread to NULL is
>>>>> the correct way to set the state to "destroying thread" and preventing
>>>>> the destructor from looping indefinitely.
>>>>
>>>> I'm not sure exactly what you mean.
>>>>
>>>>>
>>>>> Here is the relevant comment from Andreas:
>>>>>
>>>>> --------------------------------
>>>>>
>>>>> I did find a way to change the JVM to workaround this problem:
>>>>> By creating a destructor for the thread pointer TLS we can restore the
>>>>> value after pthread has set it to NULL.
>>>>> Then when the native code destructor is run the thread pointer is
>>>>> still
>>>>> intact.
>>>>>
>>>>> Restoring a value in a pthread TLS is explicitly supported
>>>>> according to
>>>>> the man page for pthread_key_create, and it will call the
>>>>> destructor for
>>>>> the restored value again.
>>>>> One would have to keep some extra state to make sure the destructor is
>>>>> only called twice, since a pthread implementation is allowed to
>>>>> call the
>>>>> destructor infinite times as long as the value is restored.
>>>>>
>>>>> On my system pthread calls the destructor a maximum of four times, so
>>>>> the attached JVM patch was sufficient as a proof of concept.
>>>>>
>>>>> ————————————————
>>>>
>>>> We implemented the basic patch, we don't do anything to ensure it was
>>>> called at most twice. We expect all well behaving apps to detach
>>>> threads from the JVM before they terminate.
>>>>
>>>> David
>>>> -----
>>>>
>>>>>
>>>>>> On Mar 17, 2016, at 9:02 PM, David Holmes <david.holmes at oracle.com
>>>>>> <mailto:david.holmes at oracle.com>
>>>>>> <mailto:david.holmes at oracle.com>> wrote:
>>>>>>
>>>>>> Thomas writes:
>>>>>>> Hi Brian,
>>>>>>>
>>>>>>>
>>>>>>>> The next patches where less straightforward.  When running java
>>>>>>>> I was
>>>>>>>> getting a ton of messages like:
>>>>>>>> Thread 832744400 has exited with leftover thread-specific data
>>>>>>>> after 4
>>>>>>>> destructor iterations
>>>>>>>> After doing a lot of digging and debugging on Linux, I found the
>>>>>>>> code path
>>>>>>>> for Linux was identical for Freebsd and the cleanup destructor
>>>>>>>> was being
>>>>>>>> executed 4 times just like Freebsd, the difference being that
>>>>>>>> Freebsd would
>>>>>>>> print out this benign warning while Linux would just ignore it.
>>>>>>>> The
>>>>>>>> problem is that all threads that are created and initialize TLS
>>>>>>>> current
>>>>>>>> thread data, must clean them up by explicitly setting the TLS
>>>>>>>> current
>>>>>>>> thread to null.  I’ve come up with two approaches to accomplish
>>>>>>>> this.
>>>>>>>>
>>>>>>>> clean up TLS current thread at end of ::run functions similar to
>>>>>>>> how
>>>>>>>> it's
>>>>>>>> done in openjdk8.
>>>>>>>>
>>>>>>>> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/
>>>>>>>>
>>>>>>>> clear current thread before exiting java_start to avoid warnings
>>>>>>>> from
>>>>>>>> leftover pthread_setspecific data
>>>>>>>>
>>>>>>>> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> I do not think this is a real leak. From what I remember of how
>>>>>>> the glibc
>>>>>>> implements TLS, setting the TLS slot value to NULL would not in
>>>>>>> itself
>>>>>>> delete anything. In VM, this slot keeps the pointer to the current
>>>>>>> Thread*,
>>>>>>> which is correctly deleted at the end of the thread (void
>>>>>>> JavaThread::thread_main_inner()).
>>>>>>>
>>>>>>> Digging further, I found the pthread key destructor
>>>>>>> "restore_thread_pointer(void* p)" in threadLocalStorage_posix.cpp:
>>>>>>>
>>>>>>> // Restore the thread pointer if the destructor is called. This
>>>>>>> is in
>>>>>>> case
>>>>>>> // someone from JNI code sets up a destructor with
>>>>>>> pthread_key_create
>>>>>>> to run
>>>>>>> // detachCurrentThread on thread death. Unless we restore the thread
>>>>>>> pointer we
>>>>>>> // will hang or crash. When detachCurrentThread is called the key
>>>>>>> will be
>>>>>>> set
>>>>>>> // to null and we will not be called again. If
>>>>>>> detachCurrentThread is
>>>>>>> never
>>>>>>> // called we could loop forever depending on the pthread
>>>>>>> implementation.
>>>>>>> extern "C" void restore_thread_pointer(void* p) {
>>>>>>> ThreadLocalStorage::set_thread((Thread*) p);
>>>>>>> }
>>>>>>>
>>>>>>> So, it seems we even reset deliberately the thread pointer to a
>>>>>>> non-NULL
>>>>>>> value. The comment claims that we reset the Thread* value in case
>>>>>>> there is
>>>>>>> another user-provided destructor which runs afterwards and which
>>>>>>> does detachCurrentThread () which would require Thread::current() to
>>>>>>> work.
>>>>>>> But there a details I do not understand:
>>>>>>>
>>>>>>> - At this point, should the Thread* object not already be
>>>>>>> deallocated, so
>>>>>>> this would be a dangling pointer anyway?
>>>>>>>
>>>>>>> - Also, according to Posix, this is unspecified. Doc on
>>>>>>> pthread_setspecific() states: "Calling pthread_setspecific() from a
>>>>>>> thread-specific data destructor routine may result either in lost
>>>>>>> storage
>>>>>>> (after at least PTHREAD_DESTRUCTOR_ITERATIONS attempts at
>>>>>>> destruction) or
>>>>>>> in an infinite loop."
>>>>>>>
>>>>>>> - In jdk8, we did reset the slot value to NULL before Thread exit.
>>>>>>> So, in
>>>>>>> this case detachCurrentThread() from a pthread_key destructor
>>>>>>> should not
>>>>>>> have worked at all.
>>>>>>>
>>>>>>> Could someone from Oracle maybe shed light on this?
>>>>>>
>>>>>> Please see the following discussion and bug report:
>>>>>>
>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010759.html
>>>>>>
>>>>>>
>>>>>> Note I don't follow this list so please include me directly in any
>>>>>> follow-ups if needed.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>
>>>
>>