openjdk9 on Freebsd

Sun Mar 20 21:59:55 UTC 2016

On 20/03/2016 4:46 PM, Brian Gardner wrote:
> Yes, I was referring to the call to ThreadLocalStorage::set_thread(NULL).
>
> The problem I'm trying to resolve are the pthread destructors being
> called repeadly.  On linux this isn’t a problem because the destructor
> is called 4 times then silently gives up.  On FreeBSD the destructor is
> called 4 times then prints a warning to stderr, which is a problem,
> although it is harmless.
>
> The message below from the original thread states there are three
> scenarios threads fall into in regards to the initial commit. The third
> scenario is the problem scenario I just mentioned and while it is ok on
> Linux, it isn’t ok on Freebsd because of the warnings to stderr.
>
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010796.html
>
> I didn’t articulate it very well but I was trying to say that calling
> ThreadLocalStorage::set_thread(NULL) durning thread cleanup is the
> proper way to prevent the destructor from being called repeatedly.  I
> actually can’t think of an alternate way to do this.
>
> In openjdk8 all threads clean themselves by calling
> ThreadLocalStorage::set_thread(NULL). I’m going to do some more research
> to locate the change sets for these calls. Perhaps they are isolated to
> bsd-port branch. I’ll let everyone know what I find.

Ah now I see what you are referring to. In JDK 8 we call 
set_thread(NULL) as part of the Thread destructor. We also call it 
explicitly on the VMThread because it's destructor is never executed.

In JDK 9 we also call set_thread(NULL) as part of the Thread destructor 
(now inside Thread::clear_thread_current) but the VMThread does not call 
this. That was a change I introduced with:

https://bugs.openjdk.java.net/browse/JDK-8132510

"Replace ThreadLocalStorage with compiler/language-based thread-local 
variables"

I did not see the point in clearing TLS for the VMThread because the 
primary purpose of that was to allow for threads detaching or 
terminating and the VMThread does neither. I don't quite understand how 
the TLS destructor gets involved in this case - does process termination 
call TLS destructors on existing threads ??

Anyway it should be harmless to add back a call to 
Thread::clear_thread_current() where previously we had:

   // Thread destructor usually does this.
   ThreadLocalStorage::set_thread(NULL);

I will file a bug for that.

Thanks,
David
-----

> Kind regards,
> Brian Gardner
>
>
>> On Mar 18, 2016, at 2:57 PM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>> On 19/03/2016 2:58 AM, Brian Gardner wrote:
>>> That explains the destructor.  Looking at the initial change set that
>>> came out of this bug, we also see the first spot where we set the TLS
>>> current thread is set to NULL
>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494
>>
>> Are you referring to:
>>
>> +  // Thread destructor usually does this.
>> +  ThreadLocalStorage::set_thread(NULL);
>>
>> ? That's only there because the VMThread destructor is never called.
>>
>>> So I think it’s safe to say that setting TLS current thread to NULL is
>>> the correct way to set the state to "destroying thread" and preventing
>>> the destructor from looping indefinitely.
>>
>> I'm not sure exactly what you mean.
>>
>>>
>>> Here is the relevant comment from Andreas:
>>>
>>> --------------------------------
>>>
>>> I did find a way to change the JVM to workaround this problem:
>>> By creating a destructor for the thread pointer TLS we can restore the
>>> value after pthread has set it to NULL.
>>> Then when the native code destructor is run the thread pointer is still
>>> intact.
>>>
>>> Restoring a value in a pthread TLS is explicitly supported according to
>>> the man page for pthread_key_create, and it will call the destructor for
>>> the restored value again.
>>> One would have to keep some extra state to make sure the destructor is
>>> only called twice, since a pthread implementation is allowed to call the
>>> destructor infinite times as long as the value is restored.
>>>
>>> On my system pthread calls the destructor a maximum of four times, so
>>> the attached JVM patch was sufficient as a proof of concept.
>>>
>>> ————————————————
>>
>> We implemented the basic patch, we don't do anything to ensure it was
>> called at most twice. We expect all well behaving apps to detach
>> threads from the JVM before they terminate.
>>
>> David
>> -----
>>
>>>
>>>> On Mar 17, 2016, at 9:02 PM, David Holmes <david.holmes at oracle.com
>>>> <mailto:david.holmes at oracle.com>
>>>> <mailto:david.holmes at oracle.com>> wrote:
>>>>
>>>> Thomas writes:
>>>>> Hi Brian,
>>>>>
>>>>>
>>>>>> The next patches where less straightforward.  When running java I was
>>>>>> getting a ton of messages like:
>>>>>> Thread 832744400 has exited with leftover thread-specific data after 4
>>>>>> destructor iterations
>>>>>> After doing a lot of digging and debugging on Linux, I found the
>>>>>> code path
>>>>>> for Linux was identical for Freebsd and the cleanup destructor was
>>>>>> being
>>>>>> executed 4 times just like Freebsd, the difference being that
>>>>>> Freebsd would
>>>>>> print out this benign warning while Linux would just ignore it.  The
>>>>>> problem is that all threads that are created and initialize TLS
>>>>>> current
>>>>>> thread data, must clean them up by explicitly setting the TLS current
>>>>>> thread to null.  I’ve come up with two approaches to accomplish this.
>>>>>>
>>>>>> clean up TLS current thread at end of ::run functions similar to how
>>>>>> it's
>>>>>> done in openjdk8.
>>>>>>
>>>>>> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/
>>>>>> clear current thread before exiting java_start to avoid warnings from
>>>>>> leftover pthread_setspecific data
>>>>>>
>>>>>> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/
>>>>>>
>>>>>>
>>>>>>
>>>>> I do not think this is a real leak. From what I remember of how the
>>>>> glibc
>>>>> implements TLS, setting the TLS slot value to NULL would not in itself
>>>>> delete anything. In VM, this slot keeps the pointer to the current
>>>>> Thread*,
>>>>> which is correctly deleted at the end of the thread (void
>>>>> JavaThread::thread_main_inner()).
>>>>>
>>>>> Digging further, I found the pthread key destructor
>>>>> "restore_thread_pointer(void* p)" in threadLocalStorage_posix.cpp:
>>>>>
>>>>> // Restore the thread pointer if the destructor is called. This is in
>>>>> case
>>>>> // someone from JNI code sets up a destructor with pthread_key_create
>>>>> to run
>>>>> // detachCurrentThread on thread death. Unless we restore the thread
>>>>> pointer we
>>>>> // will hang or crash. When detachCurrentThread is called the key
>>>>> will be
>>>>> set
>>>>> // to null and we will not be called again. If detachCurrentThread is
>>>>> never
>>>>> // called we could loop forever depending on the pthread
>>>>> implementation.
>>>>> extern "C" void restore_thread_pointer(void* p) {
>>>>> ThreadLocalStorage::set_thread((Thread*) p);
>>>>> }
>>>>>
>>>>> So, it seems we even reset deliberately the thread pointer to a
>>>>> non-NULL
>>>>> value. The comment claims that we reset the Thread* value in case
>>>>> there is
>>>>> another user-provided destructor which runs afterwards and which
>>>>> does detachCurrentThread () which would require Thread::current() to
>>>>> work.
>>>>> But there a details I do not understand:
>>>>>
>>>>> - At this point, should the Thread* object not already be
>>>>> deallocated, so
>>>>> this would be a dangling pointer anyway?
>>>>>
>>>>> - Also, according to Posix, this is unspecified. Doc on
>>>>> pthread_setspecific() states: "Calling pthread_setspecific() from a
>>>>> thread-specific data destructor routine may result either in lost
>>>>> storage
>>>>> (after at least PTHREAD_DESTRUCTOR_ITERATIONS attempts at
>>>>> destruction) or
>>>>> in an infinite loop."
>>>>>
>>>>> - In jdk8, we did reset the slot value to NULL before Thread exit.
>>>>> So, in
>>>>> this case detachCurrentThread() from a pthread_key destructor
>>>>> should not
>>>>> have worked at all.
>>>>>
>>>>> Could someone from Oracle maybe shed light on this?
>>>>
>>>> Please see the following discussion and bug report:
>>>>
>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010759.html
>>>>
>>>> Note I don't follow this list so please include me directly in any
>>>> follow-ups if needed.
>>>>
>>>> Thanks,
>>>> David
>