openjdk9 on Freebsd

Wed Apr 20 04:35:35 UTC 2016

FYI I filed:

https://bugs.openjdk.java.net/browse/JDK-8154715

David
-----

On 21/03/2016 10:45 AM, David Holmes wrote:
> A further followup/clarification ...
>
> On 21/03/2016 8:20 AM, David Holmes wrote:
>> Sorry I didn't seem this before my previous reply ...
>>
>> On 21/03/2016 4:53 AM, Brian Gardner wrote:
>>> I looked into the origination of other calls to set_thread(NULL) in
>>> openjdk8.
>>>
>>>   *
>>> hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepThread.cpp:163:
>>>
>>>
>>>     ThreadLocalStorage::set_thread(NULL);
>>>   *
>>> hotspot/src/share/vm/gc_implementation/shared/concurrentGCThread.cpp:91:
>>>     ThreadLocalStorage::set_thread(NULL);
>>>   * hotspot/src/share/vm/runtime/thread.cpp:1351:
>>>     ThreadLocalStorage::set_thread(NULL);
>>>
>>> These are use cases where set_thread(NULL) was called roughly towards
>>> the end of the threads run() method.  Annotating them showed they’ve
>>> been around prior to being open sourced.  At some point in time there
>>> was an effort made to ensure set_thread(NULL) was always called on all
>>> threads, and there is a degradation in openjdk9 in these regards.
>>
>> Any thread that terminates explicitly will do this as part of the Thread
>> destructor. For other threads it didn't seem necessary given they would
>> terminate only when the process terminated.
>
> Some non-JavaThreads can actually terminate but AFAICS fail to ever
> destroy the associated Thread instance. That would seem to be a bug in
> itself. If we did ensure the destructor was called on these threads then
> that was also deal with the removal of the set_thread(NULL). If the
> threads do not terminate however, and are still running when the VM
> terminates, then there is no place to put the "missing"
> set_thread(NULL). Note this applies to Java threads to - any Java thread
> still running at VM termination never has set_thread(NULL) called.
>
> So I'm still somewhat at a loss to understand when a "missing"
> set_thread(NULL) can cause a problem at VM termination?
>
> Thanks,
> David
>
>>> If cleaning things up in the Thread destructor (Thread::~Thread) worked
>>> on all platforms, it would be a clean way to handle the cleanup.  But
>>> since this doesn’t get called on all platforms, this logic should have
>>> been moved to JavaThread::run().  WatcherThread::run initializes it’s
>>> TLS and cleans it up, why not have JavaThread::run() do the same?  I
>>> don’t like JavaThread::run leaving it up to all it’s implementations to
>>> clean themselves up.
>>
>> This is not a platform specific issue. Thread::~Thread gets called, or
>> not, for the same set of threads on all platforms. The VMThread, nor
>> those other threads, is not a JavaThread so JavaThread::run does not
>> come into it. And for JavaThreads we don't want to do this cleanup in
>> JavaThread::run because we may still need to refer to the current
>> thread, so it is done during Thread::~Thread when the thread has
>> "terminated" as far as the VM is concerned.
>>
>> The problem at hand only affects threads that never terminate/detach.
>> They previously would call set_thread(NULL) (though why is unclear), but
>> after my changes they don't. But that can be rectified and as I said I
>> will file a bug to handle that.
>>
>> Please note however I am heading out on vacation in two days so this is
>> unlikely to get fixed until I return in a couple of weeks.
>>
>> Thanks,
>> David
>>
>>> Brian Gardner
>>>
>>>
>>>> On Mar 19, 2016, at 11:46 PM, Brian Gardner <openjdk at getsnappy.com
>>>> <mailto:openjdk at getsnappy.com>> wrote:
>>>>
>>>> Yes, I was referring to the call to
>>>> ThreadLocalStorage::set_thread(NULL).
>>>>
>>>> The problem I'm trying to resolve are the pthread destructors being
>>>> called repeadly.  On linux this isn’t a problem because the destructor
>>>> is called 4 times then silently gives up.  On FreeBSD the destructor
>>>> is called 4 times then prints a warning to stderr, which is a problem,
>>>> although it is harmless.
>>>>
>>>> The message below from the original thread states there are three
>>>> scenarios threads fall into in regards to the initial commit. The
>>>> third scenario is the problem scenario I just mentioned and while it
>>>> is ok on Linux, it isn’t ok on Freebsd because of the warnings to
>>>> stderr.
>>>>
>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010796.html
>>>>
>>>>
>>>> I didn’t articulate it very well but I was trying to say that calling
>>>> ThreadLocalStorage::set_thread(NULL) durning thread cleanup is the
>>>> proper way to prevent the destructor from being called repeatedly.  I
>>>> actually can’t think of an alternate way to do this.
>>>> In openjdk8 all threads clean themselves by calling
>>>> ThreadLocalStorage::set_thread(NULL). I’m going to do some more
>>>> research to locate the change sets for these calls. Perhaps they are
>>>> isolated to bsd-port branch. I’ll let everyone know what I find.
>>>>
>>>> Kind regards,
>>>> Brian Gardner
>>>>
>>>>
>>>>> On Mar 18, 2016, at 2:57 PM, David Holmes <david.holmes at oracle.com
>>>>> <mailto:david.holmes at oracle.com>> wrote:
>>>>>
>>>>> On 19/03/2016 2:58 AM, Brian Gardner wrote:
>>>>>> That explains the destructor.  Looking at the initial change set that
>>>>>> came out of this bug, we also see the first spot where we set the TLS
>>>>>> current thread is set to NULL
>>>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494
>>>>>
>>>>> Are you referring to:
>>>>>
>>>>> +  // Thread destructor usually does this.
>>>>> +  ThreadLocalStorage::set_thread(NULL);
>>>>>
>>>>> ? That's only there because the VMThread destructor is never called.
>>>>>
>>>>>> So I think it’s safe to say that setting TLS current thread to
>>>>>> NULL is
>>>>>> the correct way to set the state to "destroying thread" and
>>>>>> preventing
>>>>>> the destructor from looping indefinitely.
>>>>>
>>>>> I'm not sure exactly what you mean.
>>>>>
>>>>>>
>>>>>> Here is the relevant comment from Andreas:
>>>>>>
>>>>>> --------------------------------
>>>>>>
>>>>>> I did find a way to change the JVM to workaround this problem:
>>>>>> By creating a destructor for the thread pointer TLS we can restore
>>>>>> the
>>>>>> value after pthread has set it to NULL.
>>>>>> Then when the native code destructor is run the thread pointer is
>>>>>> still
>>>>>> intact.
>>>>>>
>>>>>> Restoring a value in a pthread TLS is explicitly supported
>>>>>> according to
>>>>>> the man page for pthread_key_create, and it will call the
>>>>>> destructor for
>>>>>> the restored value again.
>>>>>> One would have to keep some extra state to make sure the
>>>>>> destructor is
>>>>>> only called twice, since a pthread implementation is allowed to
>>>>>> call the
>>>>>> destructor infinite times as long as the value is restored.
>>>>>>
>>>>>> On my system pthread calls the destructor a maximum of four times, so
>>>>>> the attached JVM patch was sufficient as a proof of concept.
>>>>>>
>>>>>> ————————————————
>>>>>
>>>>> We implemented the basic patch, we don't do anything to ensure it was
>>>>> called at most twice. We expect all well behaving apps to detach
>>>>> threads from the JVM before they terminate.
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>>
>>>>>>> On Mar 17, 2016, at 9:02 PM, David Holmes <david.holmes at oracle.com
>>>>>>> <mailto:david.holmes at oracle.com>
>>>>>>> <mailto:david.holmes at oracle.com>> wrote:
>>>>>>>
>>>>>>> Thomas writes:
>>>>>>>> Hi Brian,
>>>>>>>>
>>>>>>>>
>>>>>>>>> The next patches where less straightforward.  When running java
>>>>>>>>> I was
>>>>>>>>> getting a ton of messages like:
>>>>>>>>> Thread 832744400 has exited with leftover thread-specific data
>>>>>>>>> after 4
>>>>>>>>> destructor iterations
>>>>>>>>> After doing a lot of digging and debugging on Linux, I found the
>>>>>>>>> code path
>>>>>>>>> for Linux was identical for Freebsd and the cleanup destructor
>>>>>>>>> was being
>>>>>>>>> executed 4 times just like Freebsd, the difference being that
>>>>>>>>> Freebsd would
>>>>>>>>> print out this benign warning while Linux would just ignore it.
>>>>>>>>> The
>>>>>>>>> problem is that all threads that are created and initialize TLS
>>>>>>>>> current
>>>>>>>>> thread data, must clean them up by explicitly setting the TLS
>>>>>>>>> current
>>>>>>>>> thread to null.  I’ve come up with two approaches to accomplish
>>>>>>>>> this.
>>>>>>>>>
>>>>>>>>> clean up TLS current thread at end of ::run functions similar to
>>>>>>>>> how
>>>>>>>>> it's
>>>>>>>>> done in openjdk8.
>>>>>>>>>
>>>>>>>>> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> clear current thread before exiting java_start to avoid warnings
>>>>>>>>> from
>>>>>>>>> leftover pthread_setspecific data
>>>>>>>>>
>>>>>>>>> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> I do not think this is a real leak. From what I remember of how
>>>>>>>> the glibc
>>>>>>>> implements TLS, setting the TLS slot value to NULL would not in
>>>>>>>> itself
>>>>>>>> delete anything. In VM, this slot keeps the pointer to the current
>>>>>>>> Thread*,
>>>>>>>> which is correctly deleted at the end of the thread (void
>>>>>>>> JavaThread::thread_main_inner()).
>>>>>>>>
>>>>>>>> Digging further, I found the pthread key destructor
>>>>>>>> "restore_thread_pointer(void* p)" in threadLocalStorage_posix.cpp:
>>>>>>>>
>>>>>>>> // Restore the thread pointer if the destructor is called. This
>>>>>>>> is in
>>>>>>>> case
>>>>>>>> // someone from JNI code sets up a destructor with
>>>>>>>> pthread_key_create
>>>>>>>> to run
>>>>>>>> // detachCurrentThread on thread death. Unless we restore the
>>>>>>>> thread
>>>>>>>> pointer we
>>>>>>>> // will hang or crash. When detachCurrentThread is called the key
>>>>>>>> will be
>>>>>>>> set
>>>>>>>> // to null and we will not be called again. If
>>>>>>>> detachCurrentThread is
>>>>>>>> never
>>>>>>>> // called we could loop forever depending on the pthread
>>>>>>>> implementation.
>>>>>>>> extern "C" void restore_thread_pointer(void* p) {
>>>>>>>> ThreadLocalStorage::set_thread((Thread*) p);
>>>>>>>> }
>>>>>>>>
>>>>>>>> So, it seems we even reset deliberately the thread pointer to a
>>>>>>>> non-NULL
>>>>>>>> value. The comment claims that we reset the Thread* value in case
>>>>>>>> there is
>>>>>>>> another user-provided destructor which runs afterwards and which
>>>>>>>> does detachCurrentThread () which would require
>>>>>>>> Thread::current() to
>>>>>>>> work.
>>>>>>>> But there a details I do not understand:
>>>>>>>>
>>>>>>>> - At this point, should the Thread* object not already be
>>>>>>>> deallocated, so
>>>>>>>> this would be a dangling pointer anyway?
>>>>>>>>
>>>>>>>> - Also, according to Posix, this is unspecified. Doc on
>>>>>>>> pthread_setspecific() states: "Calling pthread_setspecific() from a
>>>>>>>> thread-specific data destructor routine may result either in lost
>>>>>>>> storage
>>>>>>>> (after at least PTHREAD_DESTRUCTOR_ITERATIONS attempts at
>>>>>>>> destruction) or
>>>>>>>> in an infinite loop."
>>>>>>>>
>>>>>>>> - In jdk8, we did reset the slot value to NULL before Thread exit.
>>>>>>>> So, in
>>>>>>>> this case detachCurrentThread() from a pthread_key destructor
>>>>>>>> should not
>>>>>>>> have worked at all.
>>>>>>>>
>>>>>>>> Could someone from Oracle maybe shed light on this?
>>>>>>>
>>>>>>> Please see the following discussion and bug report:
>>>>>>>
>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010759.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Note I don't follow this list so please include me directly in any
>>>>>>> follow-ups if needed.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>
>>>>
>>>