Hello, I’ve been working on getting jdk9 to build cleanly and compile and run HelloWorld cleanly on Freebsd. This was following up on a couple patches released by Magnus Ipse Bursie who works at Oracle. He is interested in getting some of these changes integrated into to jdk9 mainline. Here are his patches: WebRev for build changes: http://cr.openjdk.java.net/~ihse/JDK-8147795-build-system-support-for-bsd/we... <http://cr.openjdk.java.net/~ihse/JDK-8147795-build-system-support-for-bsd/webrev.01> To make this compile properly on BSD, some source code changes are also needed. Here is a simple patch that fixes the compilation show-stoppers. Note that this is *not* part of this bug. http://cr.openjdk.java.net/~ihse/JDK-8147795_addendum-bsd-source-patches/web... <http://cr.openjdk.java.net/~ihse/JDK-8147795_addendum-bsd-source-patches/webrev.01/> Here are my patches which supplement Magnus’s: porting build_vm_def.sh from bsd-port/jdk8 repo, to fix NM errors during build http://brian.timestudybuddy.com/webrev/hotspot__NM/webrev/ <http://brian.timestudybuddy.com/webrev/hotspot__NM/webrev/> Add SUPPORT_RESERVED_STACK_AREA flag for all BSD's http://brian.timestudybuddy.com/webrev/hotspot__SUPPORT_RESERVED_STACK_AREA/... <http://brian.timestudybuddy.com/webrev/hotspot__SUPPORT_RESERVED_STACK_AREA/webrev/> porting getthreadid logic from bsd-port/jdk8. calling syscall(SYS_thr_self) caused pthread_setspecific to be cleared. http://brian.timestudybuddy.com/webrev/hotspot__os_bsd_cpp__getthreadid/webr... <http://brian.timestudybuddy.com/webrev/hotspot__os_bsd_cpp__getthreadid/webrev/> adding in servicability agent ported from bsd-port/jdk8 http://brian.timestudybuddy.com/webrev/hotspot__sa/webrev/ <http://brian.timestudybuddy.com/webrev/hotspot__sa/webrev/> adding classlist.bsd that is identical to classlist.linux, in order to compile http://brian.timestudybuddy.com/webrev/jdk__classlist-bsd/webrev/ <http://brian.timestudybuddy.com/webrev/jdk__classlist-bsd/webrev/> The next patches where less straightforward. When running java I was getting a ton of messages like: Thread 832744400 has exited with leftover thread-specific data after 4 destructor iterations After doing a lot of digging and debugging on Linux, I found the code path for Linux was identical for Freebsd and the cleanup destructor was being executed 4 times just like Freebsd, the difference being that Freebsd would print out this benign warning while Linux would just ignore it. The problem is that all threads that are created and initialize TLS current thread data, must clean them up by explicitly setting the TLS current thread to null. I’ve come up with two approaches to accomplish this. clean up TLS current thread at end of ::run functions similar to how it's done in openjdk8. http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/ <http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/> clear current thread before exiting java_start to avoid warnings from leftover pthread_setspecific data http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web... <http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/> With all these patches I’ve accomplished my initial goal of getting Freebsd to build cleanly and compile and run HelloWorld cleanly on Freebsd Thanks, Brian Gardner
Is anyone else interested in getting these changes into the jdk9 mainstream? It’s been over a month without a response.
On Feb 15, 2016, at 6:13 PM, Brian Gardner <openjdk@getsnappy.com> wrote:
Hello, I’ve been working on getting jdk9 to build cleanly and compile and run HelloWorld cleanly on Freebsd. This was following up on a couple patches released by Magnus Ipse Bursie who works at Oracle. He is interested in getting some of these changes integrated into to jdk9 mainline.
Here are his patches: WebRev for build changes: http://cr.openjdk.java.net/~ihse/JDK-8147795-build-system-support-for-bsd/we... <http://cr.openjdk.java.net/~ihse/JDK-8147795-build-system-support-for-bsd/webrev.01>
To make this compile properly on BSD, some source code changes are also needed. Here is a simple patch that fixes the compilation show-stoppers. Note that this is *not* part of this bug. http://cr.openjdk.java.net/~ihse/JDK-8147795_addendum-bsd-source-patches/web... <http://cr.openjdk.java.net/~ihse/JDK-8147795_addendum-bsd-source-patches/webrev.01/>
Here are my patches which supplement Magnus’s:
porting build_vm_def.sh from bsd-port/jdk8 repo, to fix NM errors during build http://brian.timestudybuddy.com/webrev/hotspot__NM/webrev/ <http://brian.timestudybuddy.com/webrev/hotspot__NM/webrev/> Add SUPPORT_RESERVED_STACK_AREA flag for all BSD's http://brian.timestudybuddy.com/webrev/hotspot__SUPPORT_RESERVED_STACK_AREA/... <http://brian.timestudybuddy.com/webrev/hotspot__SUPPORT_RESERVED_STACK_AREA/webrev/> porting getthreadid logic from bsd-port/jdk8. calling syscall(SYS_thr_self) caused pthread_setspecific to be cleared. http://brian.timestudybuddy.com/webrev/hotspot__os_bsd_cpp__getthreadid/webr... <http://brian.timestudybuddy.com/webrev/hotspot__os_bsd_cpp__getthreadid/webrev/> adding in servicability agent ported from bsd-port/jdk8 http://brian.timestudybuddy.com/webrev/hotspot__sa/webrev/ <http://brian.timestudybuddy.com/webrev/hotspot__sa/webrev/> adding classlist.bsd that is identical to classlist.linux, in order to compile http://brian.timestudybuddy.com/webrev/jdk__classlist-bsd/webrev/ <http://brian.timestudybuddy.com/webrev/jdk__classlist-bsd/webrev/>
The next patches where less straightforward. When running java I was getting a ton of messages like: Thread 832744400 has exited with leftover thread-specific data after 4 destructor iterations After doing a lot of digging and debugging on Linux, I found the code path for Linux was identical for Freebsd and the cleanup destructor was being executed 4 times just like Freebsd, the difference being that Freebsd would print out this benign warning while Linux would just ignore it. The problem is that all threads that are created and initialize TLS current thread data, must clean them up by explicitly setting the TLS current thread to null. I’ve come up with two approaches to accomplish this.
clean up TLS current thread at end of ::run functions similar to how it's done in openjdk8. http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/ <http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/> clear current thread before exiting java_start to avoid warnings from leftover pthread_setspecific data http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web... <http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/>
With all these patches I’ve accomplished my initial goal of getting Freebsd to build cleanly and compile and run HelloWorld cleanly on Freebsd
Thanks, Brian Gardner
Hi Brian,
The next patches where less straightforward. When running java I was getting a ton of messages like: Thread 832744400 has exited with leftover thread-specific data after 4 destructor iterations After doing a lot of digging and debugging on Linux, I found the code path for Linux was identical for Freebsd and the cleanup destructor was being executed 4 times just like Freebsd, the difference being that Freebsd would print out this benign warning while Linux would just ignore it. The problem is that all threads that are created and initialize TLS current thread data, must clean them up by explicitly setting the TLS current thread to null. I’ve come up with two approaches to accomplish this.
clean up TLS current thread at end of ::run functions similar to how it's done in openjdk8.
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/ clear current thread before exiting java_start to avoid warnings from leftover pthread_setspecific data
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web...
I do not think this is a real leak. From what I remember of how the glibc implements TLS, setting the TLS slot value to NULL would not in itself delete anything. In VM, this slot keeps the pointer to the current Thread*, which is correctly deleted at the end of the thread (void JavaThread::thread_main_inner()). Digging further, I found the pthread key destructor "restore_thread_pointer(void* p)" in threadLocalStorage_posix.cpp: // Restore the thread pointer if the destructor is called. This is in case // someone from JNI code sets up a destructor with pthread_key_create to run // detachCurrentThread on thread death. Unless we restore the thread pointer we // will hang or crash. When detachCurrentThread is called the key will be set // to null and we will not be called again. If detachCurrentThread is never // called we could loop forever depending on the pthread implementation. extern "C" void restore_thread_pointer(void* p) { ThreadLocalStorage::set_thread((Thread*) p); } So, it seems we even reset deliberately the thread pointer to a non-NULL value. The comment claims that we reset the Thread* value in case there is another user-provided destructor which runs afterwards and which does detachCurrentThread () which would require Thread::current() to work. But there a details I do not understand: - At this point, should the Thread* object not already be deallocated, so this would be a dangling pointer anyway? - Also, according to Posix, this is unspecified. Doc on pthread_setspecific() states: "Calling pthread_setspecific() from a thread-specific data destructor routine may result either in lost storage (after at least PTHREAD_DESTRUCTOR_ITERATIONS attempts at destruction) or in an infinite loop." - In jdk8, we did reset the slot value to NULL before Thread exit. So, in this case detachCurrentThread() from a pthread_key destructor should not have worked at all. Could someone from Oracle maybe shed light on this? Kind Regards, Thomas With all these patches I’ve accomplished my initial goal of getting Freebsd
to build cleanly and compile and run HelloWorld cleanly on Freebsd
Thanks, Brian Gardner
Thomas writes:
Hi Brian,
The next patches where less straightforward. When running java I was getting a ton of messages like: Thread 832744400 has exited with leftover thread-specific data after 4 destructor iterations After doing a lot of digging and debugging on Linux, I found the code path for Linux was identical for Freebsd and the cleanup destructor was being executed 4 times just like Freebsd, the difference being that Freebsd would print out this benign warning while Linux would just ignore it. The problem is that all threads that are created and initialize TLS current thread data, must clean them up by explicitly setting the TLS current thread to null. I’ve come up with two approaches to accomplish this.
clean up TLS current thread at end of ::run functions similar to how it's done in openjdk8.
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/ clear current thread before exiting java_start to avoid warnings from leftover pthread_setspecific data
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web...
I do not think this is a real leak. From what I remember of how the glibc implements TLS, setting the TLS slot value to NULL would not in itself delete anything. In VM, this slot keeps the pointer to the current Thread*, which is correctly deleted at the end of the thread (void JavaThread::thread_main_inner()).
Digging further, I found the pthread key destructor "restore_thread_pointer(void* p)" in threadLocalStorage_posix.cpp:
// Restore the thread pointer if the destructor is called. This is in case // someone from JNI code sets up a destructor with pthread_key_create to run // detachCurrentThread on thread death. Unless we restore the thread pointer we // will hang or crash. When detachCurrentThread is called the key will be set // to null and we will not be called again. If detachCurrentThread is never // called we could loop forever depending on the pthread implementation. extern "C" void restore_thread_pointer(void* p) { ThreadLocalStorage::set_thread((Thread*) p); }
So, it seems we even reset deliberately the thread pointer to a non-NULL value. The comment claims that we reset the Thread* value in case there is another user-provided destructor which runs afterwards and which does detachCurrentThread () which would require Thread::current() to work. But there a details I do not understand:
- At this point, should the Thread* object not already be deallocated, so this would be a dangling pointer anyway?
- Also, according to Posix, this is unspecified. Doc on pthread_setspecific() states: "Calling pthread_setspecific() from a thread-specific data destructor routine may result either in lost storage (after at least PTHREAD_DESTRUCTOR_ITERATIONS attempts at destruction) or in an infinite loop."
- In jdk8, we did reset the slot value to NULL before Thread exit. So, in this case detachCurrentThread() from a pthread_key destructor should not have worked at all.
Could someone from Oracle maybe shed light on this?
Please see the following discussion and bug report: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010... Note I don't follow this list so please include me directly in any follow-ups if needed. Thanks, David
That explains the destructor. Looking at the initial change set that came out of this bug, we also see the first spot where we set the TLS current thread is set to NULL http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494 <http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494> So I think it’s safe to say that setting TLS current thread to NULL is the correct way to set the state to "destroying thread" and preventing the destructor from looping indefinitely. Here is the relevant comment from Andreas: -------------------------------- I did find a way to change the JVM to workaround this problem: By creating a destructor for the thread pointer TLS we can restore the value after pthread has set it to NULL. Then when the native code destructor is run the thread pointer is still intact. Restoring a value in a pthread TLS is explicitly supported according to the man page for pthread_key_create, and it will call the destructor for the restored value again. One would have to keep some extra state to make sure the destructor is only called twice, since a pthread implementation is allowed to call the destructor infinite times as long as the value is restored. On my system pthread calls the destructor a maximum of four times, so the attached JVM patch was sufficient as a proof of concept. ————————————————
On Mar 17, 2016, at 9:02 PM, David Holmes <david.holmes@oracle.com> wrote:
Thomas writes:
Hi Brian,
The next patches where less straightforward. When running java I was getting a ton of messages like: Thread 832744400 has exited with leftover thread-specific data after 4 destructor iterations After doing a lot of digging and debugging on Linux, I found the code path for Linux was identical for Freebsd and the cleanup destructor was being executed 4 times just like Freebsd, the difference being that Freebsd would print out this benign warning while Linux would just ignore it. The problem is that all threads that are created and initialize TLS current thread data, must clean them up by explicitly setting the TLS current thread to null. I’ve come up with two approaches to accomplish this.
clean up TLS current thread at end of ::run functions similar to how it's done in openjdk8.
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/ clear current thread before exiting java_start to avoid warnings from leftover pthread_setspecific data
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web...
I do not think this is a real leak. From what I remember of how the glibc implements TLS, setting the TLS slot value to NULL would not in itself delete anything. In VM, this slot keeps the pointer to the current Thread*, which is correctly deleted at the end of the thread (void JavaThread::thread_main_inner()).
Digging further, I found the pthread key destructor "restore_thread_pointer(void* p)" in threadLocalStorage_posix.cpp:
// Restore the thread pointer if the destructor is called. This is in case // someone from JNI code sets up a destructor with pthread_key_create to run // detachCurrentThread on thread death. Unless we restore the thread pointer we // will hang or crash. When detachCurrentThread is called the key will be set // to null and we will not be called again. If detachCurrentThread is never // called we could loop forever depending on the pthread implementation. extern "C" void restore_thread_pointer(void* p) { ThreadLocalStorage::set_thread((Thread*) p); }
So, it seems we even reset deliberately the thread pointer to a non-NULL value. The comment claims that we reset the Thread* value in case there is another user-provided destructor which runs afterwards and which does detachCurrentThread () which would require Thread::current() to work. But there a details I do not understand:
- At this point, should the Thread* object not already be deallocated, so this would be a dangling pointer anyway?
- Also, according to Posix, this is unspecified. Doc on pthread_setspecific() states: "Calling pthread_setspecific() from a thread-specific data destructor routine may result either in lost storage (after at least PTHREAD_DESTRUCTOR_ITERATIONS attempts at destruction) or in an infinite loop."
- In jdk8, we did reset the slot value to NULL before Thread exit. So, in this case detachCurrentThread() from a pthread_key destructor should not have worked at all.
Could someone from Oracle maybe shed light on this?
Please see the following discussion and bug report:
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010...
Note I don't follow this list so please include me directly in any follow-ups if needed.
Thanks, David
On 19/03/2016 2:58 AM, Brian Gardner wrote:
That explains the destructor. Looking at the initial change set that came out of this bug, we also see the first spot where we set the TLS current thread is set to NULL http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494
Are you referring to: + // Thread destructor usually does this. + ThreadLocalStorage::set_thread(NULL); ? That's only there because the VMThread destructor is never called.
So I think it’s safe to say that setting TLS current thread to NULL is the correct way to set the state to "destroying thread" and preventing the destructor from looping indefinitely.
I'm not sure exactly what you mean.
Here is the relevant comment from Andreas:
--------------------------------
I did find a way to change the JVM to workaround this problem: By creating a destructor for the thread pointer TLS we can restore the value after pthread has set it to NULL. Then when the native code destructor is run the thread pointer is still intact.
Restoring a value in a pthread TLS is explicitly supported according to the man page for pthread_key_create, and it will call the destructor for the restored value again. One would have to keep some extra state to make sure the destructor is only called twice, since a pthread implementation is allowed to call the destructor infinite times as long as the value is restored.
On my system pthread calls the destructor a maximum of four times, so the attached JVM patch was sufficient as a proof of concept.
————————————————
We implemented the basic patch, we don't do anything to ensure it was called at most twice. We expect all well behaving apps to detach threads from the JVM before they terminate. David -----
On Mar 17, 2016, at 9:02 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:
Thomas writes:
Hi Brian,
The next patches where less straightforward. When running java I was getting a ton of messages like: Thread 832744400 has exited with leftover thread-specific data after 4 destructor iterations After doing a lot of digging and debugging on Linux, I found the code path for Linux was identical for Freebsd and the cleanup destructor was being executed 4 times just like Freebsd, the difference being that Freebsd would print out this benign warning while Linux would just ignore it. The problem is that all threads that are created and initialize TLS current thread data, must clean them up by explicitly setting the TLS current thread to null. I’ve come up with two approaches to accomplish this.
clean up TLS current thread at end of ::run functions similar to how it's done in openjdk8.
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/ clear current thread before exiting java_start to avoid warnings from leftover pthread_setspecific data
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web...
I do not think this is a real leak. From what I remember of how the glibc implements TLS, setting the TLS slot value to NULL would not in itself delete anything. In VM, this slot keeps the pointer to the current Thread*, which is correctly deleted at the end of the thread (void JavaThread::thread_main_inner()).
Digging further, I found the pthread key destructor "restore_thread_pointer(void* p)" in threadLocalStorage_posix.cpp:
// Restore the thread pointer if the destructor is called. This is in case // someone from JNI code sets up a destructor with pthread_key_create to run // detachCurrentThread on thread death. Unless we restore the thread pointer we // will hang or crash. When detachCurrentThread is called the key will be set // to null and we will not be called again. If detachCurrentThread is never // called we could loop forever depending on the pthread implementation. extern "C" void restore_thread_pointer(void* p) { ThreadLocalStorage::set_thread((Thread*) p); }
So, it seems we even reset deliberately the thread pointer to a non-NULL value. The comment claims that we reset the Thread* value in case there is another user-provided destructor which runs afterwards and which does detachCurrentThread () which would require Thread::current() to work. But there a details I do not understand:
- At this point, should the Thread* object not already be deallocated, so this would be a dangling pointer anyway?
- Also, according to Posix, this is unspecified. Doc on pthread_setspecific() states: "Calling pthread_setspecific() from a thread-specific data destructor routine may result either in lost storage (after at least PTHREAD_DESTRUCTOR_ITERATIONS attempts at destruction) or in an infinite loop."
- In jdk8, we did reset the slot value to NULL before Thread exit. So, in this case detachCurrentThread() from a pthread_key destructor should not have worked at all.
Could someone from Oracle maybe shed light on this?
Please see the following discussion and bug report:
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010...
Note I don't follow this list so please include me directly in any follow-ups if needed.
Thanks, David
Yes, I was referring to the call to ThreadLocalStorage::set_thread(NULL). The problem I'm trying to resolve are the pthread destructors being called repeadly. On linux this isn’t a problem because the destructor is called 4 times then silently gives up. On FreeBSD the destructor is called 4 times then prints a warning to stderr, which is a problem, although it is harmless. The message below from the original thread states there are three scenarios threads fall into in regards to the initial commit. The third scenario is the problem scenario I just mentioned and while it is ok on Linux, it isn’t ok on Freebsd because of the warnings to stderr. http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010... <http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010796.html> I didn’t articulate it very well but I was trying to say that calling ThreadLocalStorage::set_thread(NULL) durning thread cleanup is the proper way to prevent the destructor from being called repeatedly. I actually can’t think of an alternate way to do this. In openjdk8 all threads clean themselves by calling ThreadLocalStorage::set_thread(NULL). I’m going to do some more research to locate the change sets for these calls. Perhaps they are isolated to bsd-port branch. I’ll let everyone know what I find. Kind regards, Brian Gardner
On Mar 18, 2016, at 2:57 PM, David Holmes <david.holmes@oracle.com> wrote:
On 19/03/2016 2:58 AM, Brian Gardner wrote:
That explains the destructor. Looking at the initial change set that came out of this bug, we also see the first spot where we set the TLS current thread is set to NULL http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494 <http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494>
Are you referring to:
+ // Thread destructor usually does this. + ThreadLocalStorage::set_thread(NULL);
? That's only there because the VMThread destructor is never called.
So I think it’s safe to say that setting TLS current thread to NULL is the correct way to set the state to "destroying thread" and preventing the destructor from looping indefinitely.
I'm not sure exactly what you mean.
Here is the relevant comment from Andreas:
--------------------------------
I did find a way to change the JVM to workaround this problem: By creating a destructor for the thread pointer TLS we can restore the value after pthread has set it to NULL. Then when the native code destructor is run the thread pointer is still intact.
Restoring a value in a pthread TLS is explicitly supported according to the man page for pthread_key_create, and it will call the destructor for the restored value again. One would have to keep some extra state to make sure the destructor is only called twice, since a pthread implementation is allowed to call the destructor infinite times as long as the value is restored.
On my system pthread calls the destructor a maximum of four times, so the attached JVM patch was sufficient as a proof of concept.
————————————————
We implemented the basic patch, we don't do anything to ensure it was called at most twice. We expect all well behaving apps to detach threads from the JVM before they terminate.
David -----
On Mar 17, 2016, at 9:02 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com> <mailto:david.holmes@oracle.com <mailto:david.holmes@oracle.com>>> wrote:
Thomas writes:
Hi Brian,
The next patches where less straightforward. When running java I was getting a ton of messages like: Thread 832744400 has exited with leftover thread-specific data after 4 destructor iterations After doing a lot of digging and debugging on Linux, I found the code path for Linux was identical for Freebsd and the cleanup destructor was being executed 4 times just like Freebsd, the difference being that Freebsd would print out this benign warning while Linux would just ignore it. The problem is that all threads that are created and initialize TLS current thread data, must clean them up by explicitly setting the TLS current thread to null. I’ve come up with two approaches to accomplish this.
clean up TLS current thread at end of ::run functions similar to how it's done in openjdk8.
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/ clear current thread before exiting java_start to avoid warnings from leftover pthread_setspecific data
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web...
I do not think this is a real leak. From what I remember of how the glibc implements TLS, setting the TLS slot value to NULL would not in itself delete anything. In VM, this slot keeps the pointer to the current Thread*, which is correctly deleted at the end of the thread (void JavaThread::thread_main_inner()).
Digging further, I found the pthread key destructor "restore_thread_pointer(void* p)" in threadLocalStorage_posix.cpp:
// Restore the thread pointer if the destructor is called. This is in case // someone from JNI code sets up a destructor with pthread_key_create to run // detachCurrentThread on thread death. Unless we restore the thread pointer we // will hang or crash. When detachCurrentThread is called the key will be set // to null and we will not be called again. If detachCurrentThread is never // called we could loop forever depending on the pthread implementation. extern "C" void restore_thread_pointer(void* p) { ThreadLocalStorage::set_thread((Thread*) p); }
So, it seems we even reset deliberately the thread pointer to a non-NULL value. The comment claims that we reset the Thread* value in case there is another user-provided destructor which runs afterwards and which does detachCurrentThread () which would require Thread::current() to work. But there a details I do not understand:
- At this point, should the Thread* object not already be deallocated, so this would be a dangling pointer anyway?
- Also, according to Posix, this is unspecified. Doc on pthread_setspecific() states: "Calling pthread_setspecific() from a thread-specific data destructor routine may result either in lost storage (after at least PTHREAD_DESTRUCTOR_ITERATIONS attempts at destruction) or in an infinite loop."
- In jdk8, we did reset the slot value to NULL before Thread exit. So, in this case detachCurrentThread() from a pthread_key destructor should not have worked at all.
Could someone from Oracle maybe shed light on this?
Please see the following discussion and bug report:
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010...
Note I don't follow this list so please include me directly in any follow-ups if needed.
Thanks, David
I looked into the origination of other calls to set_thread(NULL) in openjdk8. hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepThread.cpp:163: ThreadLocalStorage::set_thread(NULL); hotspot/src/share/vm/gc_implementation/shared/concurrentGCThread.cpp:91: ThreadLocalStorage::set_thread(NULL); hotspot/src/share/vm/runtime/thread.cpp:1351: ThreadLocalStorage::set_thread(NULL); These are use cases where set_thread(NULL) was called roughly towards the end of the threads run() method. Annotating them showed they’ve been around prior to being open sourced. At some point in time there was an effort made to ensure set_thread(NULL) was always called on all threads, and there is a degradation in openjdk9 in these regards. If cleaning things up in the Thread destructor (Thread::~Thread) worked on all platforms, it would be a clean way to handle the cleanup. But since this doesn’t get called on all platforms, this logic should have been moved to JavaThread::run(). WatcherThread::run initializes it’s TLS and cleans it up, why not have JavaThread::run() do the same? I don’t like JavaThread::run leaving it up to all it’s implementations to clean themselves up. Brian Gardner
On Mar 19, 2016, at 11:46 PM, Brian Gardner <openjdk@getsnappy.com> wrote:
Yes, I was referring to the call to ThreadLocalStorage::set_thread(NULL).
The problem I'm trying to resolve are the pthread destructors being called repeadly. On linux this isn’t a problem because the destructor is called 4 times then silently gives up. On FreeBSD the destructor is called 4 times then prints a warning to stderr, which is a problem, although it is harmless.
The message below from the original thread states there are three scenarios threads fall into in regards to the initial commit. The third scenario is the problem scenario I just mentioned and while it is ok on Linux, it isn’t ok on Freebsd because of the warnings to stderr.
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010... <http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010796.html> I didn’t articulate it very well but I was trying to say that calling ThreadLocalStorage::set_thread(NULL) durning thread cleanup is the proper way to prevent the destructor from being called repeatedly. I actually can’t think of an alternate way to do this. In openjdk8 all threads clean themselves by calling ThreadLocalStorage::set_thread(NULL). I’m going to do some more research to locate the change sets for these calls. Perhaps they are isolated to bsd-port branch. I’ll let everyone know what I find.
Kind regards, Brian Gardner
On Mar 18, 2016, at 2:57 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:
On 19/03/2016 2:58 AM, Brian Gardner wrote:
That explains the destructor. Looking at the initial change set that came out of this bug, we also see the first spot where we set the TLS current thread is set to NULL http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494 <http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494>
Are you referring to:
+ // Thread destructor usually does this. + ThreadLocalStorage::set_thread(NULL);
? That's only there because the VMThread destructor is never called.
So I think it’s safe to say that setting TLS current thread to NULL is the correct way to set the state to "destroying thread" and preventing the destructor from looping indefinitely.
I'm not sure exactly what you mean.
Here is the relevant comment from Andreas:
--------------------------------
I did find a way to change the JVM to workaround this problem: By creating a destructor for the thread pointer TLS we can restore the value after pthread has set it to NULL. Then when the native code destructor is run the thread pointer is still intact.
Restoring a value in a pthread TLS is explicitly supported according to the man page for pthread_key_create, and it will call the destructor for the restored value again. One would have to keep some extra state to make sure the destructor is only called twice, since a pthread implementation is allowed to call the destructor infinite times as long as the value is restored.
On my system pthread calls the destructor a maximum of four times, so the attached JVM patch was sufficient as a proof of concept.
————————————————
We implemented the basic patch, we don't do anything to ensure it was called at most twice. We expect all well behaving apps to detach threads from the JVM before they terminate.
David -----
On Mar 17, 2016, at 9:02 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com> <mailto:david.holmes@oracle.com <mailto:david.holmes@oracle.com>>> wrote:
Thomas writes:
Hi Brian,
The next patches where less straightforward. When running java I was getting a ton of messages like: Thread 832744400 has exited with leftover thread-specific data after 4 destructor iterations After doing a lot of digging and debugging on Linux, I found the code path for Linux was identical for Freebsd and the cleanup destructor was being executed 4 times just like Freebsd, the difference being that Freebsd would print out this benign warning while Linux would just ignore it. The problem is that all threads that are created and initialize TLS current thread data, must clean them up by explicitly setting the TLS current thread to null. I’ve come up with two approaches to accomplish this.
clean up TLS current thread at end of ::run functions similar to how it's done in openjdk8.
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/ <http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/> clear current thread before exiting java_start to avoid warnings from leftover pthread_setspecific data
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web...
I do not think this is a real leak. From what I remember of how the glibc implements TLS, setting the TLS slot value to NULL would not in itself delete anything. In VM, this slot keeps the pointer to the current Thread*, which is correctly deleted at the end of the thread (void JavaThread::thread_main_inner()).
Digging further, I found the pthread key destructor "restore_thread_pointer(void* p)" in threadLocalStorage_posix.cpp:
// Restore the thread pointer if the destructor is called. This is in case // someone from JNI code sets up a destructor with pthread_key_create to run // detachCurrentThread on thread death. Unless we restore the thread pointer we // will hang or crash. When detachCurrentThread is called the key will be set // to null and we will not be called again. If detachCurrentThread is never // called we could loop forever depending on the pthread implementation. extern "C" void restore_thread_pointer(void* p) { ThreadLocalStorage::set_thread((Thread*) p); }
So, it seems we even reset deliberately the thread pointer to a non-NULL value. The comment claims that we reset the Thread* value in case there is another user-provided destructor which runs afterwards and which does detachCurrentThread () which would require Thread::current() to work. But there a details I do not understand:
- At this point, should the Thread* object not already be deallocated, so this would be a dangling pointer anyway?
- Also, according to Posix, this is unspecified. Doc on pthread_setspecific() states: "Calling pthread_setspecific() from a thread-specific data destructor routine may result either in lost storage (after at least PTHREAD_DESTRUCTOR_ITERATIONS attempts at destruction) or in an infinite loop."
- In jdk8, we did reset the slot value to NULL before Thread exit. So, in this case detachCurrentThread() from a pthread_key destructor should not have worked at all.
Could someone from Oracle maybe shed light on this?
Please see the following discussion and bug report:
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010... <http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010759.html>
Note I don't follow this list so please include me directly in any follow-ups if needed.
Thanks, David
Sorry I didn't seem this before my previous reply ... On 21/03/2016 4:53 AM, Brian Gardner wrote:
I looked into the origination of other calls to set_thread(NULL) in openjdk8.
* hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepThread.cpp:163: ThreadLocalStorage::set_thread(NULL); * hotspot/src/share/vm/gc_implementation/shared/concurrentGCThread.cpp:91: ThreadLocalStorage::set_thread(NULL); * hotspot/src/share/vm/runtime/thread.cpp:1351: ThreadLocalStorage::set_thread(NULL);
These are use cases where set_thread(NULL) was called roughly towards the end of the threads run() method. Annotating them showed they’ve been around prior to being open sourced. At some point in time there was an effort made to ensure set_thread(NULL) was always called on all threads, and there is a degradation in openjdk9 in these regards.
Any thread that terminates explicitly will do this as part of the Thread destructor. For other threads it didn't seem necessary given they would terminate only when the process terminated.
If cleaning things up in the Thread destructor (Thread::~Thread) worked on all platforms, it would be a clean way to handle the cleanup. But since this doesn’t get called on all platforms, this logic should have been moved to JavaThread::run(). WatcherThread::run initializes it’s TLS and cleans it up, why not have JavaThread::run() do the same? I don’t like JavaThread::run leaving it up to all it’s implementations to clean themselves up.
This is not a platform specific issue. Thread::~Thread gets called, or not, for the same set of threads on all platforms. The VMThread, nor those other threads, is not a JavaThread so JavaThread::run does not come into it. And for JavaThreads we don't want to do this cleanup in JavaThread::run because we may still need to refer to the current thread, so it is done during Thread::~Thread when the thread has "terminated" as far as the VM is concerned. The problem at hand only affects threads that never terminate/detach. They previously would call set_thread(NULL) (though why is unclear), but after my changes they don't. But that can be rectified and as I said I will file a bug to handle that. Please note however I am heading out on vacation in two days so this is unlikely to get fixed until I return in a couple of weeks. Thanks, David
Brian Gardner
On Mar 19, 2016, at 11:46 PM, Brian Gardner <openjdk@getsnappy.com <mailto:openjdk@getsnappy.com>> wrote:
Yes, I was referring to the call to ThreadLocalStorage::set_thread(NULL).
The problem I'm trying to resolve are the pthread destructors being called repeadly. On linux this isn’t a problem because the destructor is called 4 times then silently gives up. On FreeBSD the destructor is called 4 times then prints a warning to stderr, which is a problem, although it is harmless.
The message below from the original thread states there are three scenarios threads fall into in regards to the initial commit. The third scenario is the problem scenario I just mentioned and while it is ok on Linux, it isn’t ok on Freebsd because of the warnings to stderr.
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010... I didn’t articulate it very well but I was trying to say that calling ThreadLocalStorage::set_thread(NULL) durning thread cleanup is the proper way to prevent the destructor from being called repeatedly. I actually can’t think of an alternate way to do this. In openjdk8 all threads clean themselves by calling ThreadLocalStorage::set_thread(NULL). I’m going to do some more research to locate the change sets for these calls. Perhaps they are isolated to bsd-port branch. I’ll let everyone know what I find.
Kind regards, Brian Gardner
On Mar 18, 2016, at 2:57 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:
On 19/03/2016 2:58 AM, Brian Gardner wrote:
That explains the destructor. Looking at the initial change set that came out of this bug, we also see the first spot where we set the TLS current thread is set to NULL http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494
Are you referring to:
+ // Thread destructor usually does this. + ThreadLocalStorage::set_thread(NULL);
? That's only there because the VMThread destructor is never called.
So I think it’s safe to say that setting TLS current thread to NULL is the correct way to set the state to "destroying thread" and preventing the destructor from looping indefinitely.
I'm not sure exactly what you mean.
Here is the relevant comment from Andreas:
--------------------------------
I did find a way to change the JVM to workaround this problem: By creating a destructor for the thread pointer TLS we can restore the value after pthread has set it to NULL. Then when the native code destructor is run the thread pointer is still intact.
Restoring a value in a pthread TLS is explicitly supported according to the man page for pthread_key_create, and it will call the destructor for the restored value again. One would have to keep some extra state to make sure the destructor is only called twice, since a pthread implementation is allowed to call the destructor infinite times as long as the value is restored.
On my system pthread calls the destructor a maximum of four times, so the attached JVM patch was sufficient as a proof of concept.
————————————————
We implemented the basic patch, we don't do anything to ensure it was called at most twice. We expect all well behaving apps to detach threads from the JVM before they terminate.
David -----
On Mar 17, 2016, at 9:02 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com> <mailto:david.holmes@oracle.com>> wrote:
Thomas writes:
Hi Brian,
> The next patches where less straightforward. When running java I was > getting a ton of messages like: > Thread 832744400 has exited with leftover thread-specific data > after 4 > destructor iterations > After doing a lot of digging and debugging on Linux, I found the > code path > for Linux was identical for Freebsd and the cleanup destructor > was being > executed 4 times just like Freebsd, the difference being that > Freebsd would > print out this benign warning while Linux would just ignore it. The > problem is that all threads that are created and initialize TLS > current > thread data, must clean them up by explicitly setting the TLS current > thread to null. I’ve come up with two approaches to accomplish this. > > clean up TLS current thread at end of ::run functions similar to how > it's > done in openjdk8. > > http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/ > clear current thread before exiting java_start to avoid warnings from > leftover pthread_setspecific data > > http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web... > > > I do not think this is a real leak. From what I remember of how the glibc implements TLS, setting the TLS slot value to NULL would not in itself delete anything. In VM, this slot keeps the pointer to the current Thread*, which is correctly deleted at the end of the thread (void JavaThread::thread_main_inner()).
Digging further, I found the pthread key destructor "restore_thread_pointer(void* p)" in threadLocalStorage_posix.cpp:
// Restore the thread pointer if the destructor is called. This is in case // someone from JNI code sets up a destructor with pthread_key_create to run // detachCurrentThread on thread death. Unless we restore the thread pointer we // will hang or crash. When detachCurrentThread is called the key will be set // to null and we will not be called again. If detachCurrentThread is never // called we could loop forever depending on the pthread implementation. extern "C" void restore_thread_pointer(void* p) { ThreadLocalStorage::set_thread((Thread*) p); }
So, it seems we even reset deliberately the thread pointer to a non-NULL value. The comment claims that we reset the Thread* value in case there is another user-provided destructor which runs afterwards and which does detachCurrentThread () which would require Thread::current() to work. But there a details I do not understand:
- At this point, should the Thread* object not already be deallocated, so this would be a dangling pointer anyway?
- Also, according to Posix, this is unspecified. Doc on pthread_setspecific() states: "Calling pthread_setspecific() from a thread-specific data destructor routine may result either in lost storage (after at least PTHREAD_DESTRUCTOR_ITERATIONS attempts at destruction) or in an infinite loop."
- In jdk8, we did reset the slot value to NULL before Thread exit. So, in this case detachCurrentThread() from a pthread_key destructor should not have worked at all.
Could someone from Oracle maybe shed light on this?
Please see the following discussion and bug report:
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010...
Note I don't follow this list so please include me directly in any follow-ups if needed.
Thanks, David
A further followup/clarification ... On 21/03/2016 8:20 AM, David Holmes wrote:
Sorry I didn't seem this before my previous reply ...
On 21/03/2016 4:53 AM, Brian Gardner wrote:
I looked into the origination of other calls to set_thread(NULL) in openjdk8.
* hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepThread.cpp:163:
ThreadLocalStorage::set_thread(NULL); * hotspot/src/share/vm/gc_implementation/shared/concurrentGCThread.cpp:91: ThreadLocalStorage::set_thread(NULL); * hotspot/src/share/vm/runtime/thread.cpp:1351: ThreadLocalStorage::set_thread(NULL);
These are use cases where set_thread(NULL) was called roughly towards the end of the threads run() method. Annotating them showed they’ve been around prior to being open sourced. At some point in time there was an effort made to ensure set_thread(NULL) was always called on all threads, and there is a degradation in openjdk9 in these regards.
Any thread that terminates explicitly will do this as part of the Thread destructor. For other threads it didn't seem necessary given they would terminate only when the process terminated.
Some non-JavaThreads can actually terminate but AFAICS fail to ever destroy the associated Thread instance. That would seem to be a bug in itself. If we did ensure the destructor was called on these threads then that was also deal with the removal of the set_thread(NULL). If the threads do not terminate however, and are still running when the VM terminates, then there is no place to put the "missing" set_thread(NULL). Note this applies to Java threads to - any Java thread still running at VM termination never has set_thread(NULL) called. So I'm still somewhat at a loss to understand when a "missing" set_thread(NULL) can cause a problem at VM termination? Thanks, David
If cleaning things up in the Thread destructor (Thread::~Thread) worked on all platforms, it would be a clean way to handle the cleanup. But since this doesn’t get called on all platforms, this logic should have been moved to JavaThread::run(). WatcherThread::run initializes it’s TLS and cleans it up, why not have JavaThread::run() do the same? I don’t like JavaThread::run leaving it up to all it’s implementations to clean themselves up.
This is not a platform specific issue. Thread::~Thread gets called, or not, for the same set of threads on all platforms. The VMThread, nor those other threads, is not a JavaThread so JavaThread::run does not come into it. And for JavaThreads we don't want to do this cleanup in JavaThread::run because we may still need to refer to the current thread, so it is done during Thread::~Thread when the thread has "terminated" as far as the VM is concerned.
The problem at hand only affects threads that never terminate/detach. They previously would call set_thread(NULL) (though why is unclear), but after my changes they don't. But that can be rectified and as I said I will file a bug to handle that.
Please note however I am heading out on vacation in two days so this is unlikely to get fixed until I return in a couple of weeks.
Thanks, David
Brian Gardner
On Mar 19, 2016, at 11:46 PM, Brian Gardner <openjdk@getsnappy.com <mailto:openjdk@getsnappy.com>> wrote:
Yes, I was referring to the call to ThreadLocalStorage::set_thread(NULL).
The problem I'm trying to resolve are the pthread destructors being called repeadly. On linux this isn’t a problem because the destructor is called 4 times then silently gives up. On FreeBSD the destructor is called 4 times then prints a warning to stderr, which is a problem, although it is harmless.
The message below from the original thread states there are three scenarios threads fall into in regards to the initial commit. The third scenario is the problem scenario I just mentioned and while it is ok on Linux, it isn’t ok on Freebsd because of the warnings to stderr.
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010...
I didn’t articulate it very well but I was trying to say that calling ThreadLocalStorage::set_thread(NULL) durning thread cleanup is the proper way to prevent the destructor from being called repeatedly. I actually can’t think of an alternate way to do this. In openjdk8 all threads clean themselves by calling ThreadLocalStorage::set_thread(NULL). I’m going to do some more research to locate the change sets for these calls. Perhaps they are isolated to bsd-port branch. I’ll let everyone know what I find.
Kind regards, Brian Gardner
On Mar 18, 2016, at 2:57 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:
On 19/03/2016 2:58 AM, Brian Gardner wrote:
That explains the destructor. Looking at the initial change set that came out of this bug, we also see the first spot where we set the TLS current thread is set to NULL http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494
Are you referring to:
+ // Thread destructor usually does this. + ThreadLocalStorage::set_thread(NULL);
? That's only there because the VMThread destructor is never called.
So I think it’s safe to say that setting TLS current thread to NULL is the correct way to set the state to "destroying thread" and preventing the destructor from looping indefinitely.
I'm not sure exactly what you mean.
Here is the relevant comment from Andreas:
--------------------------------
I did find a way to change the JVM to workaround this problem: By creating a destructor for the thread pointer TLS we can restore the value after pthread has set it to NULL. Then when the native code destructor is run the thread pointer is still intact.
Restoring a value in a pthread TLS is explicitly supported according to the man page for pthread_key_create, and it will call the destructor for the restored value again. One would have to keep some extra state to make sure the destructor is only called twice, since a pthread implementation is allowed to call the destructor infinite times as long as the value is restored.
On my system pthread calls the destructor a maximum of four times, so the attached JVM patch was sufficient as a proof of concept.
————————————————
We implemented the basic patch, we don't do anything to ensure it was called at most twice. We expect all well behaving apps to detach threads from the JVM before they terminate.
David -----
On Mar 17, 2016, at 9:02 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com> <mailto:david.holmes@oracle.com>> wrote:
Thomas writes: > Hi Brian, > > >> The next patches where less straightforward. When running java >> I was >> getting a ton of messages like: >> Thread 832744400 has exited with leftover thread-specific data >> after 4 >> destructor iterations >> After doing a lot of digging and debugging on Linux, I found the >> code path >> for Linux was identical for Freebsd and the cleanup destructor >> was being >> executed 4 times just like Freebsd, the difference being that >> Freebsd would >> print out this benign warning while Linux would just ignore it. >> The >> problem is that all threads that are created and initialize TLS >> current >> thread data, must clean them up by explicitly setting the TLS >> current >> thread to null. I’ve come up with two approaches to accomplish >> this. >> >> clean up TLS current thread at end of ::run functions similar to >> how >> it's >> done in openjdk8. >> >> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/ >> >> clear current thread before exiting java_start to avoid warnings >> from >> leftover pthread_setspecific data >> >> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web... >> >> >> >> > I do not think this is a real leak. From what I remember of how > the glibc > implements TLS, setting the TLS slot value to NULL would not in > itself > delete anything. In VM, this slot keeps the pointer to the current > Thread*, > which is correctly deleted at the end of the thread (void > JavaThread::thread_main_inner()). > > Digging further, I found the pthread key destructor > "restore_thread_pointer(void* p)" in threadLocalStorage_posix.cpp: > > // Restore the thread pointer if the destructor is called. This > is in > case > // someone from JNI code sets up a destructor with > pthread_key_create > to run > // detachCurrentThread on thread death. Unless we restore the thread > pointer we > // will hang or crash. When detachCurrentThread is called the key > will be > set > // to null and we will not be called again. If > detachCurrentThread is > never > // called we could loop forever depending on the pthread > implementation. > extern "C" void restore_thread_pointer(void* p) { > ThreadLocalStorage::set_thread((Thread*) p); > } > > So, it seems we even reset deliberately the thread pointer to a > non-NULL > value. The comment claims that we reset the Thread* value in case > there is > another user-provided destructor which runs afterwards and which > does detachCurrentThread () which would require Thread::current() to > work. > But there a details I do not understand: > > - At this point, should the Thread* object not already be > deallocated, so > this would be a dangling pointer anyway? > > - Also, according to Posix, this is unspecified. Doc on > pthread_setspecific() states: "Calling pthread_setspecific() from a > thread-specific data destructor routine may result either in lost > storage > (after at least PTHREAD_DESTRUCTOR_ITERATIONS attempts at > destruction) or > in an infinite loop." > > - In jdk8, we did reset the slot value to NULL before Thread exit. > So, in > this case detachCurrentThread() from a pthread_key destructor > should not > have worked at all. > > Could someone from Oracle maybe shed light on this?
Please see the following discussion and bug report:
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010...
Note I don't follow this list so please include me directly in any follow-ups if needed.
Thanks, David
FYI I filed: https://bugs.openjdk.java.net/browse/JDK-8154715 David ----- On 21/03/2016 10:45 AM, David Holmes wrote:
A further followup/clarification ...
On 21/03/2016 8:20 AM, David Holmes wrote:
Sorry I didn't seem this before my previous reply ...
On 21/03/2016 4:53 AM, Brian Gardner wrote:
I looked into the origination of other calls to set_thread(NULL) in openjdk8.
* hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepThread.cpp:163:
ThreadLocalStorage::set_thread(NULL); * hotspot/src/share/vm/gc_implementation/shared/concurrentGCThread.cpp:91: ThreadLocalStorage::set_thread(NULL); * hotspot/src/share/vm/runtime/thread.cpp:1351: ThreadLocalStorage::set_thread(NULL);
These are use cases where set_thread(NULL) was called roughly towards the end of the threads run() method. Annotating them showed they’ve been around prior to being open sourced. At some point in time there was an effort made to ensure set_thread(NULL) was always called on all threads, and there is a degradation in openjdk9 in these regards.
Any thread that terminates explicitly will do this as part of the Thread destructor. For other threads it didn't seem necessary given they would terminate only when the process terminated.
Some non-JavaThreads can actually terminate but AFAICS fail to ever destroy the associated Thread instance. That would seem to be a bug in itself. If we did ensure the destructor was called on these threads then that was also deal with the removal of the set_thread(NULL). If the threads do not terminate however, and are still running when the VM terminates, then there is no place to put the "missing" set_thread(NULL). Note this applies to Java threads to - any Java thread still running at VM termination never has set_thread(NULL) called.
So I'm still somewhat at a loss to understand when a "missing" set_thread(NULL) can cause a problem at VM termination?
Thanks, David
If cleaning things up in the Thread destructor (Thread::~Thread) worked on all platforms, it would be a clean way to handle the cleanup. But since this doesn’t get called on all platforms, this logic should have been moved to JavaThread::run(). WatcherThread::run initializes it’s TLS and cleans it up, why not have JavaThread::run() do the same? I don’t like JavaThread::run leaving it up to all it’s implementations to clean themselves up.
This is not a platform specific issue. Thread::~Thread gets called, or not, for the same set of threads on all platforms. The VMThread, nor those other threads, is not a JavaThread so JavaThread::run does not come into it. And for JavaThreads we don't want to do this cleanup in JavaThread::run because we may still need to refer to the current thread, so it is done during Thread::~Thread when the thread has "terminated" as far as the VM is concerned.
The problem at hand only affects threads that never terminate/detach. They previously would call set_thread(NULL) (though why is unclear), but after my changes they don't. But that can be rectified and as I said I will file a bug to handle that.
Please note however I am heading out on vacation in two days so this is unlikely to get fixed until I return in a couple of weeks.
Thanks, David
Brian Gardner
On Mar 19, 2016, at 11:46 PM, Brian Gardner <openjdk@getsnappy.com <mailto:openjdk@getsnappy.com>> wrote:
Yes, I was referring to the call to ThreadLocalStorage::set_thread(NULL).
The problem I'm trying to resolve are the pthread destructors being called repeadly. On linux this isn’t a problem because the destructor is called 4 times then silently gives up. On FreeBSD the destructor is called 4 times then prints a warning to stderr, which is a problem, although it is harmless.
The message below from the original thread states there are three scenarios threads fall into in regards to the initial commit. The third scenario is the problem scenario I just mentioned and while it is ok on Linux, it isn’t ok on Freebsd because of the warnings to stderr.
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010...
I didn’t articulate it very well but I was trying to say that calling ThreadLocalStorage::set_thread(NULL) durning thread cleanup is the proper way to prevent the destructor from being called repeatedly. I actually can’t think of an alternate way to do this. In openjdk8 all threads clean themselves by calling ThreadLocalStorage::set_thread(NULL). I’m going to do some more research to locate the change sets for these calls. Perhaps they are isolated to bsd-port branch. I’ll let everyone know what I find.
Kind regards, Brian Gardner
On Mar 18, 2016, at 2:57 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:
On 19/03/2016 2:58 AM, Brian Gardner wrote:
That explains the destructor. Looking at the initial change set that came out of this bug, we also see the first spot where we set the TLS current thread is set to NULL http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494
Are you referring to:
+ // Thread destructor usually does this. + ThreadLocalStorage::set_thread(NULL);
? That's only there because the VMThread destructor is never called.
So I think it’s safe to say that setting TLS current thread to NULL is the correct way to set the state to "destroying thread" and preventing the destructor from looping indefinitely.
I'm not sure exactly what you mean.
Here is the relevant comment from Andreas:
--------------------------------
I did find a way to change the JVM to workaround this problem: By creating a destructor for the thread pointer TLS we can restore the value after pthread has set it to NULL. Then when the native code destructor is run the thread pointer is still intact.
Restoring a value in a pthread TLS is explicitly supported according to the man page for pthread_key_create, and it will call the destructor for the restored value again. One would have to keep some extra state to make sure the destructor is only called twice, since a pthread implementation is allowed to call the destructor infinite times as long as the value is restored.
On my system pthread calls the destructor a maximum of four times, so the attached JVM patch was sufficient as a proof of concept.
————————————————
We implemented the basic patch, we don't do anything to ensure it was called at most twice. We expect all well behaving apps to detach threads from the JVM before they terminate.
David -----
> On Mar 17, 2016, at 9:02 PM, David Holmes <david.holmes@oracle.com > <mailto:david.holmes@oracle.com> > <mailto:david.holmes@oracle.com>> wrote: > > Thomas writes: >> Hi Brian, >> >> >>> The next patches where less straightforward. When running java >>> I was >>> getting a ton of messages like: >>> Thread 832744400 has exited with leftover thread-specific data >>> after 4 >>> destructor iterations >>> After doing a lot of digging and debugging on Linux, I found the >>> code path >>> for Linux was identical for Freebsd and the cleanup destructor >>> was being >>> executed 4 times just like Freebsd, the difference being that >>> Freebsd would >>> print out this benign warning while Linux would just ignore it. >>> The >>> problem is that all threads that are created and initialize TLS >>> current >>> thread data, must clean them up by explicitly setting the TLS >>> current >>> thread to null. I’ve come up with two approaches to accomplish >>> this. >>> >>> clean up TLS current thread at end of ::run functions similar to >>> how >>> it's >>> done in openjdk8. >>> >>> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/ >>> >>> >>> clear current thread before exiting java_start to avoid warnings >>> from >>> leftover pthread_setspecific data >>> >>> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web... >>> >>> >>> >>> >>> >> I do not think this is a real leak. From what I remember of how >> the glibc >> implements TLS, setting the TLS slot value to NULL would not in >> itself >> delete anything. In VM, this slot keeps the pointer to the current >> Thread*, >> which is correctly deleted at the end of the thread (void >> JavaThread::thread_main_inner()). >> >> Digging further, I found the pthread key destructor >> "restore_thread_pointer(void* p)" in threadLocalStorage_posix.cpp: >> >> // Restore the thread pointer if the destructor is called. This >> is in >> case >> // someone from JNI code sets up a destructor with >> pthread_key_create >> to run >> // detachCurrentThread on thread death. Unless we restore the >> thread >> pointer we >> // will hang or crash. When detachCurrentThread is called the key >> will be >> set >> // to null and we will not be called again. If >> detachCurrentThread is >> never >> // called we could loop forever depending on the pthread >> implementation. >> extern "C" void restore_thread_pointer(void* p) { >> ThreadLocalStorage::set_thread((Thread*) p); >> } >> >> So, it seems we even reset deliberately the thread pointer to a >> non-NULL >> value. The comment claims that we reset the Thread* value in case >> there is >> another user-provided destructor which runs afterwards and which >> does detachCurrentThread () which would require >> Thread::current() to >> work. >> But there a details I do not understand: >> >> - At this point, should the Thread* object not already be >> deallocated, so >> this would be a dangling pointer anyway? >> >> - Also, according to Posix, this is unspecified. Doc on >> pthread_setspecific() states: "Calling pthread_setspecific() from a >> thread-specific data destructor routine may result either in lost >> storage >> (after at least PTHREAD_DESTRUCTOR_ITERATIONS attempts at >> destruction) or >> in an infinite loop." >> >> - In jdk8, we did reset the slot value to NULL before Thread exit. >> So, in >> this case detachCurrentThread() from a pthread_key destructor >> should not >> have worked at all. >> >> Could someone from Oracle maybe shed light on this? > > Please see the following discussion and bug report: > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010... > > > > Note I don't follow this list so please include me directly in any > follow-ups if needed. > > Thanks, > David
On 20/03/2016 4:46 PM, Brian Gardner wrote:
Yes, I was referring to the call to ThreadLocalStorage::set_thread(NULL).
The problem I'm trying to resolve are the pthread destructors being called repeadly. On linux this isn’t a problem because the destructor is called 4 times then silently gives up. On FreeBSD the destructor is called 4 times then prints a warning to stderr, which is a problem, although it is harmless.
The message below from the original thread states there are three scenarios threads fall into in regards to the initial commit. The third scenario is the problem scenario I just mentioned and while it is ok on Linux, it isn’t ok on Freebsd because of the warnings to stderr.
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010...
I didn’t articulate it very well but I was trying to say that calling ThreadLocalStorage::set_thread(NULL) durning thread cleanup is the proper way to prevent the destructor from being called repeatedly. I actually can’t think of an alternate way to do this.
In openjdk8 all threads clean themselves by calling ThreadLocalStorage::set_thread(NULL). I’m going to do some more research to locate the change sets for these calls. Perhaps they are isolated to bsd-port branch. I’ll let everyone know what I find.
Ah now I see what you are referring to. In JDK 8 we call set_thread(NULL) as part of the Thread destructor. We also call it explicitly on the VMThread because it's destructor is never executed. In JDK 9 we also call set_thread(NULL) as part of the Thread destructor (now inside Thread::clear_thread_current) but the VMThread does not call this. That was a change I introduced with: https://bugs.openjdk.java.net/browse/JDK-8132510 "Replace ThreadLocalStorage with compiler/language-based thread-local variables" I did not see the point in clearing TLS for the VMThread because the primary purpose of that was to allow for threads detaching or terminating and the VMThread does neither. I don't quite understand how the TLS destructor gets involved in this case - does process termination call TLS destructors on existing threads ?? Anyway it should be harmless to add back a call to Thread::clear_thread_current() where previously we had: // Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL); I will file a bug for that. Thanks, David -----
Kind regards, Brian Gardner
On Mar 18, 2016, at 2:57 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:
On 19/03/2016 2:58 AM, Brian Gardner wrote:
That explains the destructor. Looking at the initial change set that came out of this bug, we also see the first spot where we set the TLS current thread is set to NULL http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/469835cd5494
Are you referring to:
+ // Thread destructor usually does this. + ThreadLocalStorage::set_thread(NULL);
? That's only there because the VMThread destructor is never called.
So I think it’s safe to say that setting TLS current thread to NULL is the correct way to set the state to "destroying thread" and preventing the destructor from looping indefinitely.
I'm not sure exactly what you mean.
Here is the relevant comment from Andreas:
--------------------------------
I did find a way to change the JVM to workaround this problem: By creating a destructor for the thread pointer TLS we can restore the value after pthread has set it to NULL. Then when the native code destructor is run the thread pointer is still intact.
Restoring a value in a pthread TLS is explicitly supported according to the man page for pthread_key_create, and it will call the destructor for the restored value again. One would have to keep some extra state to make sure the destructor is only called twice, since a pthread implementation is allowed to call the destructor infinite times as long as the value is restored.
On my system pthread calls the destructor a maximum of four times, so the attached JVM patch was sufficient as a proof of concept.
————————————————
We implemented the basic patch, we don't do anything to ensure it was called at most twice. We expect all well behaving apps to detach threads from the JVM before they terminate.
David -----
On Mar 17, 2016, at 9:02 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com> <mailto:david.holmes@oracle.com>> wrote:
Thomas writes:
Hi Brian,
The next patches where less straightforward. When running java I was getting a ton of messages like: Thread 832744400 has exited with leftover thread-specific data after 4 destructor iterations After doing a lot of digging and debugging on Linux, I found the code path for Linux was identical for Freebsd and the cleanup destructor was being executed 4 times just like Freebsd, the difference being that Freebsd would print out this benign warning while Linux would just ignore it. The problem is that all threads that are created and initialize TLS current thread data, must clean them up by explicitly setting the TLS current thread to null. I’ve come up with two approaches to accomplish this.
clean up TLS current thread at end of ::run functions similar to how it's done in openjdk8.
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/ clear current thread before exiting java_start to avoid warnings from leftover pthread_setspecific data
http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web...
I do not think this is a real leak. From what I remember of how the glibc implements TLS, setting the TLS slot value to NULL would not in itself delete anything. In VM, this slot keeps the pointer to the current Thread*, which is correctly deleted at the end of the thread (void JavaThread::thread_main_inner()).
Digging further, I found the pthread key destructor "restore_thread_pointer(void* p)" in threadLocalStorage_posix.cpp:
// Restore the thread pointer if the destructor is called. This is in case // someone from JNI code sets up a destructor with pthread_key_create to run // detachCurrentThread on thread death. Unless we restore the thread pointer we // will hang or crash. When detachCurrentThread is called the key will be set // to null and we will not be called again. If detachCurrentThread is never // called we could loop forever depending on the pthread implementation. extern "C" void restore_thread_pointer(void* p) { ThreadLocalStorage::set_thread((Thread*) p); }
So, it seems we even reset deliberately the thread pointer to a non-NULL value. The comment claims that we reset the Thread* value in case there is another user-provided destructor which runs afterwards and which does detachCurrentThread () which would require Thread::current() to work. But there a details I do not understand:
- At this point, should the Thread* object not already be deallocated, so this would be a dangling pointer anyway?
- Also, according to Posix, this is unspecified. Doc on pthread_setspecific() states: "Calling pthread_setspecific() from a thread-specific data destructor routine may result either in lost storage (after at least PTHREAD_DESTRUCTOR_ITERATIONS attempts at destruction) or in an infinite loop."
- In jdk8, we did reset the slot value to NULL before Thread exit. So, in this case detachCurrentThread() from a pthread_key destructor should not have worked at all.
Could someone from Oracle maybe shed light on this?
Please see the following discussion and bug report:
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010...
Note I don't follow this list so please include me directly in any follow-ups if needed.
Thanks, David
participants (3)
-
Brian Gardner
-
David Holmes
-
Thomas Stüfe