RFR: 8154715: Missing destructor and/or TLS clearing calls for terminating threads

older
Result: New BSD Port Committer:...

David Holmes

3 May 2016 3 May '16

11:39 p.m.

This needs attention from GC and runtime folk please. bug: https://bugs.openjdk.java.net/browse/JDK-8154715 webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/ tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called before a thread terminates. Background: Most system-related threads do not expect to explicitly terminate, except sometimes as part of VM termination. Such threads don't have their destructors called, but should. This omission came to light due to the ThreadLocalStorage changes in JDK-8132510. As part of that change we deleted the following from the termination path of the VMThread: // Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL); The clearing of TLS seemed irrelevant to the VMThread as it primarily is used to aid in JNI attach/detach. However Brian Gardner reported: http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm... a problem on FreeBSD caused by this change and the interaction with the POSIX pthread TLS destructor use introduced by JDK-8033696. Because the VMThread terminated without clearing TLS, when the TLS-destructor was called it got into a loop which ran four times (as happens on Linux) and then prints a warning to the console (which doesn't happen on Linux). This indicates we need to restore the: ThreadLocalStorage::set_thread(NULL); but on further consideration it seems to me that this is not confined to the VMThread, and the most appropriate fix would be to always invoke the Thread destructor as a thread terminates. Solution: Further investigation shows that calling the Thread destructor in the thread as it terminates is not possible: - VMThread This is actually destroyed by the thread that terminates the VM, but that can happen after it terminates and so we still hit the TLS problem. The VMThread may be able to destroy itself today but in the past this was not possible (see existing code comment), and in the future it may also not be possible - the problem is that the Thread destructor can interact with other VM subsystems that are concurrently being torn down by the thread that is terminating the VM. In the past this was the CodeHeap. So rather than introduce something that is fragile we stick with the current scheme but restore the ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at that time because it may already have been de-allocated. - WatcherThread The WatcherThread is never destroyed today but has the same problem as the VMThread. We can call the destructor from the VM termination thread (and have implemented that), but not from the WatcherThread itself. So again we just have to restore the ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem. - GC Threads There are two cases: a) GC threads that never terminate For these we don't need to do anything: we can't delete the thread as it never terminates and we don't hit the TLS problem because it never terminates. So all we will do here is add some logic to check (in NON_PRODUCT) that we do in fact never terminate. b) GC threads that can terminate Despite the fact the threads can terminate, references to those threads are stored elsewhere (WorkGangs and other places) and are not cleared as part of the termination process. Those references can be touched after the thread has terminated so we can not call the destructor at all. So again all we can do (without some major thread management reworking) is ensure that ThreadLocalStorage::set_thread(NULL); is called before the thread actually terminates Testing: JPRT RBT - runtime nightly tests Thanks, David

Show replies by date

David Holmes

5 May 5 May

9:01 p.m.

PING! David On 4/05/2016 9:39 AM, David Holmes wrote:

...

This needs attention from GC and runtime folk please.

bug: https://bugs.openjdk.java.net/browse/JDK-8154715 webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/

tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called before a thread terminates.

Background:

Most system-related threads do not expect to explicitly terminate, except sometimes as part of VM termination. Such threads don't have their destructors called, but should.

This omission came to light due to the ThreadLocalStorage changes in JDK-8132510. As part of that change we deleted the following from the termination path of the VMThread:

// Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL);

The clearing of TLS seemed irrelevant to the VMThread as it primarily is used to aid in JNI attach/detach. However Brian Gardner reported:

http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm...

a problem on FreeBSD caused by this change and the interaction with the POSIX pthread TLS destructor use introduced by JDK-8033696. Because the VMThread terminated without clearing TLS, when the TLS-destructor was called it got into a loop which ran four times (as happens on Linux) and then prints a warning to the console (which doesn't happen on Linux).

This indicates we need to restore the:

ThreadLocalStorage::set_thread(NULL);

but on further consideration it seems to me that this is not confined to the VMThread, and the most appropriate fix would be to always invoke the Thread destructor as a thread terminates.

Solution:

Further investigation shows that calling the Thread destructor in the thread as it terminates is not possible:

- VMThread

This is actually destroyed by the thread that terminates the VM, but that can happen after it terminates and so we still hit the TLS problem. The VMThread may be able to destroy itself today but in the past this was not possible (see existing code comment), and in the future it may also not be possible - the problem is that the Thread destructor can interact with other VM subsystems that are concurrently being torn down by the thread that is terminating the VM. In the past this was the CodeHeap. So rather than introduce something that is fragile we stick with the current scheme but restore the ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at that time because it may already have been de-allocated.

- WatcherThread

The WatcherThread is never destroyed today but has the same problem as the VMThread. We can call the destructor from the VM termination thread (and have implemented that), but not from the WatcherThread itself. So again we just have to restore the ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem.

- GC Threads

There are two cases:

a) GC threads that never terminate

For these we don't need to do anything: we can't delete the thread as it never terminates and we don't hit the TLS problem because it never terminates. So all we will do here is add some logic to check (in NON_PRODUCT) that we do in fact never terminate.

b) GC threads that can terminate

Despite the fact the threads can terminate, references to those threads are stored elsewhere (WorkGangs and other places) and are not cleared as part of the termination process. Those references can be touched after the thread has terminated so we can not call the destructor at all. So again all we can do (without some major thread management reworking) is ensure that ThreadLocalStorage::set_thread(NULL); is called before the thread actually terminates

Testing: JPRT RBT - runtime nightly tests

Thanks, David

Daniel D. Daugherty

10:03 p.m.

On 5/3/16 5:39 PM, David Holmes wrote:

...

This needs attention from GC and runtime folk please.

bug: https://bugs.openjdk.java.net/browse/JDK-8154715 webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/

src/os/solaris/vm/os_solaris.cpp No comments. (I'm guessing you didn't want to expand the existing guarantee() to cover your additional discovery.) src/share/vm/gc/parallel/gcTaskThread.cpp No comments. src/share/vm/gc/shared/concurrentGCThread.cpp No comments. src/share/vm/gc/shared/workgroup.cpp No comments. src/share/vm/runtime/thread.cpp L1388: if (watcher != NULL) L1389: delete watcher; nit: Please add '{' and '}' or make it a single line if-statement. src/share/vm/runtime/vmThread.cpp No comments. Thumbs up. Only one nit so feel free to ignore it or fix it; I don't need another webrev if you fix it. Dan

...

tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called before a thread terminates.

Background:

Most system-related threads do not expect to explicitly terminate, except sometimes as part of VM termination. Such threads don't have their destructors called, but should.

This omission came to light due to the ThreadLocalStorage changes in JDK-8132510. As part of that change we deleted the following from the termination path of the VMThread:

// Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL);

The clearing of TLS seemed irrelevant to the VMThread as it primarily is used to aid in JNI attach/detach. However Brian Gardner reported:

http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm...

a problem on FreeBSD caused by this change and the interaction with the POSIX pthread TLS destructor use introduced by JDK-8033696. Because the VMThread terminated without clearing TLS, when the TLS-destructor was called it got into a loop which ran four times (as happens on Linux) and then prints a warning to the console (which doesn't happen on Linux).

This indicates we need to restore the:

ThreadLocalStorage::set_thread(NULL);

but on further consideration it seems to me that this is not confined to the VMThread, and the most appropriate fix would be to always invoke the Thread destructor as a thread terminates.

Solution:

Further investigation shows that calling the Thread destructor in the thread as it terminates is not possible:

- VMThread

This is actually destroyed by the thread that terminates the VM, but that can happen after it terminates and so we still hit the TLS problem. The VMThread may be able to destroy itself today but in the past this was not possible (see existing code comment), and in the future it may also not be possible - the problem is that the Thread destructor can interact with other VM subsystems that are concurrently being torn down by the thread that is terminating the VM. In the past this was the CodeHeap. So rather than introduce something that is fragile we stick with the current scheme but restore the ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at that time because it may already have been de-allocated.

- WatcherThread

The WatcherThread is never destroyed today but has the same problem as the VMThread. We can call the destructor from the VM termination thread (and have implemented that), but not from the WatcherThread itself. So again we just have to restore the ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem.

- GC Threads

There are two cases:

a) GC threads that never terminate

For these we don't need to do anything: we can't delete the thread as it never terminates and we don't hit the TLS problem because it never terminates. So all we will do here is add some logic to check (in NON_PRODUCT) that we do in fact never terminate.

b) GC threads that can terminate

Despite the fact the threads can terminate, references to those threads are stored elsewhere (WorkGangs and other places) and are not cleared as part of the termination process. Those references can be touched after the thread has terminated so we can not call the destructor at all. So again all we can do (without some major thread management reworking) is ensure that ThreadLocalStorage::set_thread(NULL); is called before the thread actually terminates

Testing: JPRT RBT - runtime nightly tests

Thanks, David

David Holmes

11:42 p.m.

Hi Dan, Thanks for the Review! On 6/05/2016 8:03 AM, Daniel D. Daugherty wrote:

...

On 5/3/16 5:39 PM, David Holmes wrote:

...
This needs attention from GC and runtime folk please.

bug: https://bugs.openjdk.java.net/browse/JDK-8154715 webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/

src/os/solaris/vm/os_solaris.cpp No comments. (I'm guessing you didn't want to expand the existing guarantee() to cover your additional discovery.)

There's no way to ask "did you use to be the WatcherThread?" because by this point watcherThread() has to return NULL. So simpler to just delete - plus this was the only OS that did this check.

...

src/share/vm/gc/parallel/gcTaskThread.cpp No comments.

src/share/vm/gc/shared/concurrentGCThread.cpp No comments.

src/share/vm/gc/shared/workgroup.cpp No comments.

src/share/vm/runtime/thread.cpp L1388: if (watcher != NULL) L1389: delete watcher; nit: Please add '{' and '}' or make it a single line if-statement.

Will add braces.

...

src/share/vm/runtime/vmThread.cpp No comments.

Thumbs up. Only one nit so feel free to ignore it or fix it; I don't need another webrev if you fix it.

Thanks, David

...

Dan

...
tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called before a thread terminates.

Background:

Most system-related threads do not expect to explicitly terminate, except sometimes as part of VM termination. Such threads don't have their destructors called, but should.

This omission came to light due to the ThreadLocalStorage changes in JDK-8132510. As part of that change we deleted the following from the termination path of the VMThread:

// Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL);

The clearing of TLS seemed irrelevant to the VMThread as it primarily is used to aid in JNI attach/detach. However Brian Gardner reported:

http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm...

a problem on FreeBSD caused by this change and the interaction with the POSIX pthread TLS destructor use introduced by JDK-8033696. Because the VMThread terminated without clearing TLS, when the TLS-destructor was called it got into a loop which ran four times (as happens on Linux) and then prints a warning to the console (which doesn't happen on Linux).

This indicates we need to restore the:

ThreadLocalStorage::set_thread(NULL);

but on further consideration it seems to me that this is not confined to the VMThread, and the most appropriate fix would be to always invoke the Thread destructor as a thread terminates.

Solution:

Further investigation shows that calling the Thread destructor in the thread as it terminates is not possible:

- VMThread

This is actually destroyed by the thread that terminates the VM, but that can happen after it terminates and so we still hit the TLS problem. The VMThread may be able to destroy itself today but in the past this was not possible (see existing code comment), and in the future it may also not be possible - the problem is that the Thread destructor can interact with other VM subsystems that are concurrently being torn down by the thread that is terminating the VM. In the past this was the CodeHeap. So rather than introduce something that is fragile we stick with the current scheme but restore the ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at that time because it may already have been de-allocated.

- WatcherThread

The WatcherThread is never destroyed today but has the same problem as the VMThread. We can call the destructor from the VM termination thread (and have implemented that), but not from the WatcherThread itself. So again we just have to restore the ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem.

- GC Threads

There are two cases:

a) GC threads that never terminate

For these we don't need to do anything: we can't delete the thread as it never terminates and we don't hit the TLS problem because it never terminates. So all we will do here is add some logic to check (in NON_PRODUCT) that we do in fact never terminate.

b) GC threads that can terminate

Despite the fact the threads can terminate, references to those threads are stored elsewhere (WorkGangs and other places) and are not cleared as part of the termination process. Those references can be touched after the thread has terminated so we can not call the destructor at all. So again all we can do (without some major thread management reworking) is ensure that ThreadLocalStorage::set_thread(NULL); is called before the thread actually terminates

Testing: JPRT RBT - runtime nightly tests

Thanks, David

Stefan Karlsson

6 May 6 May

7:02 a.m.

Hi David, I looked through the GC part of this webrev and I think the change is fine. However, it seems a bit error prone. If we decide to change the code to, for example, terminate the AbstractGangWorker threads, then we have to remember to insert a ThreadLocalStorage::set_thread(NULL) call. Could we instead add a call to ThreadLocalStorage::set_thread(NULL), or maybe even Thread::clear_thread_current(), in java_start? static void *java_start(Thread *thread) { [...] thread->initialize_thread_current(); [...] // call one more level start routine thread->run(); ////////// Could we call Thread::clear_thread_current(); here? log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", pthread id: " UINTX_FORMAT ").", os::current_thread_id(), (uintx) pthread_self()); return 0; } And get rid of the explicit calls to ThreadLocalStorage::set_thread(NULL) you added? Thanks, StefanK On 04/05/16 01:39, David Holmes wrote:

...

This needs attention from GC and runtime folk please.

bug: https://bugs.openjdk.java.net/browse/JDK-8154715 webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/

tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called before a thread terminates.

Background:

Most system-related threads do not expect to explicitly terminate, except sometimes as part of VM termination. Such threads don't have their destructors called, but should.

This omission came to light due to the ThreadLocalStorage changes in JDK-8132510. As part of that change we deleted the following from the termination path of the VMThread:

// Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL);

The clearing of TLS seemed irrelevant to the VMThread as it primarily is used to aid in JNI attach/detach. However Brian Gardner reported:

http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm...

a problem on FreeBSD caused by this change and the interaction with the POSIX pthread TLS destructor use introduced by JDK-8033696. Because the VMThread terminated without clearing TLS, when the TLS-destructor was called it got into a loop which ran four times (as happens on Linux) and then prints a warning to the console (which doesn't happen on Linux).

This indicates we need to restore the:

ThreadLocalStorage::set_thread(NULL);

but on further consideration it seems to me that this is not confined to the VMThread, and the most appropriate fix would be to always invoke the Thread destructor as a thread terminates.

Solution:

Further investigation shows that calling the Thread destructor in the thread as it terminates is not possible:

- VMThread

This is actually destroyed by the thread that terminates the VM, but that can happen after it terminates and so we still hit the TLS problem. The VMThread may be able to destroy itself today but in the past this was not possible (see existing code comment), and in the future it may also not be possible - the problem is that the Thread destructor can interact with other VM subsystems that are concurrently being torn down by the thread that is terminating the VM. In the past this was the CodeHeap. So rather than introduce something that is fragile we stick with the current scheme but restore the ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at that time because it may already have been de-allocated.

- WatcherThread

The WatcherThread is never destroyed today but has the same problem as the VMThread. We can call the destructor from the VM termination thread (and have implemented that), but not from the WatcherThread itself. So again we just have to restore the ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem.

- GC Threads

There are two cases:

a) GC threads that never terminate

For these we don't need to do anything: we can't delete the thread as it never terminates and we don't hit the TLS problem because it never terminates. So all we will do here is add some logic to check (in NON_PRODUCT) that we do in fact never terminate.

b) GC threads that can terminate

Despite the fact the threads can terminate, references to those threads are stored elsewhere (WorkGangs and other places) and are not cleared as part of the termination process. Those references can be touched after the thread has terminated so we can not call the destructor at all. So again all we can do (without some major thread management reworking) is ensure that ThreadLocalStorage::set_thread(NULL); is called before the thread actually terminates

Testing: JPRT RBT - runtime nightly tests

Thanks, David

David Holmes

1:38 p.m.

Hi Stefan, Thanks for taking a look at this. On 6/05/2016 5:02 PM, Stefan Karlsson wrote:

...

Hi David,

I looked through the GC part of this webrev and I think the change is fine.

However, it seems a bit error prone. If we decide to change the code to, for example, terminate the AbstractGangWorker threads, then we have to remember to insert a ThreadLocalStorage::set_thread(NULL) call.

That's why I added the ShouldNotReachHere()'s - if those threads start terminating then we will see those hit. Perhaps a comment: ShouldNotReachHere(); // If thread terminates we have to do TLS cleanup ?

...

Could we instead add a call to ThreadLocalStorage::set_thread(NULL), or maybe even Thread::clear_thread_current(), in java_start?

static void *java_start(Thread *thread) { [...] thread->initialize_thread_current();

[...]

// call one more level start routine thread->run();

////////// Could we call Thread::clear_thread_current(); here?

Not easily. For JavaThreads we've already done "delete this" inside the run() method, so we'd have to move that into java_start as well, but we can only do the delete for JavaThreads not for other threads. And we'd also have to change the VMThread and WatcherThread termination logic because of the deletes that happen in the termination thread - the "this" pointer (thread above) may no longer be valid when we want to call clear_current_thread() - which is why we can only do the ThreadLocalStorage::set_thread(NULL). I agree it would be a lot cleaner to have java_start do: thread->common_initialization(); thread->run(); thread->common_cleanup(); delete thread; for all threads, but we'd need a lot of other changes to allow for that. Otherwise we would need to note that kind of thread before calling run() then switch on the thread type after run() to decide what kind of cleanup is necessary and possible. I don't think that would be better than just doing the "right" cleanup at the end of the run() methods. Thanks, David ------

...

log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", pthread id: " UINTX_FORMAT ").", os::current_thread_id(), (uintx) pthread_self());

return 0; }

And get rid of the explicit calls to ThreadLocalStorage::set_thread(NULL) you added?

Thanks, StefanK

On 04/05/16 01:39, David Holmes wrote:

...
This needs attention from GC and runtime folk please.

bug: https://bugs.openjdk.java.net/browse/JDK-8154715 webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/

tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called before a thread terminates.

Background:

Most system-related threads do not expect to explicitly terminate, except sometimes as part of VM termination. Such threads don't have their destructors called, but should.

This omission came to light due to the ThreadLocalStorage changes in JDK-8132510. As part of that change we deleted the following from the termination path of the VMThread:

// Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL);

The clearing of TLS seemed irrelevant to the VMThread as it primarily is used to aid in JNI attach/detach. However Brian Gardner reported:

http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm...

a problem on FreeBSD caused by this change and the interaction with the POSIX pthread TLS destructor use introduced by JDK-8033696. Because the VMThread terminated without clearing TLS, when the TLS-destructor was called it got into a loop which ran four times (as happens on Linux) and then prints a warning to the console (which doesn't happen on Linux).

This indicates we need to restore the:

ThreadLocalStorage::set_thread(NULL);

but on further consideration it seems to me that this is not confined to the VMThread, and the most appropriate fix would be to always invoke the Thread destructor as a thread terminates.

Solution:

Further investigation shows that calling the Thread destructor in the thread as it terminates is not possible:

- VMThread

This is actually destroyed by the thread that terminates the VM, but that can happen after it terminates and so we still hit the TLS problem. The VMThread may be able to destroy itself today but in the past this was not possible (see existing code comment), and in the future it may also not be possible - the problem is that the Thread destructor can interact with other VM subsystems that are concurrently being torn down by the thread that is terminating the VM. In the past this was the CodeHeap. So rather than introduce something that is fragile we stick with the current scheme but restore the ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at that time because it may already have been de-allocated.

- WatcherThread

The WatcherThread is never destroyed today but has the same problem as the VMThread. We can call the destructor from the VM termination thread (and have implemented that), but not from the WatcherThread itself. So again we just have to restore the ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem.

- GC Threads

There are two cases:

a) GC threads that never terminate

For these we don't need to do anything: we can't delete the thread as it never terminates and we don't hit the TLS problem because it never terminates. So all we will do here is add some logic to check (in NON_PRODUCT) that we do in fact never terminate.

b) GC threads that can terminate

Despite the fact the threads can terminate, references to those threads are stored elsewhere (WorkGangs and other places) and are not cleared as part of the termination process. Those references can be touched after the thread has terminated so we can not call the destructor at all. So again all we can do (without some major thread management reworking) is ensure that ThreadLocalStorage::set_thread(NULL); is called before the thread actually terminates

Testing: JPRT RBT - runtime nightly tests

Thanks, David

Stefan Karlsson

2:04 p.m.

Hi David, On 06/05/16 15:38, David Holmes wrote:

...

Hi Stefan,

Thanks for taking a look at this.

On 6/05/2016 5:02 PM, Stefan Karlsson wrote:

...
Hi David,

I looked through the GC part of this webrev and I think the change is fine.

However, it seems a bit error prone. If we decide to change the code to, for example, terminate the AbstractGangWorker threads, then we have to remember to insert a ThreadLocalStorage::set_thread(NULL) call.

That's why I added the ShouldNotReachHere()'s - if those threads start terminating then we will see those hit. Perhaps a comment:

ShouldNotReachHere(); // If thread terminates we have to do TLS cleanup

?

Yes, I would appreciate a comment. Though, when we add new threads, we need to remember to add the set_thread(NULL) call.

...

...
Could we instead add a call to ThreadLocalStorage::set_thread(NULL), or maybe even Thread::clear_thread_current(), in java_start?

static void *java_start(Thread *thread) { [...] thread->initialize_thread_current();

[...]

// call one more level start routine thread->run();

////////// Could we call Thread::clear_thread_current(); here?

Not easily. For JavaThreads we've already done "delete this" inside the run() method, so we'd have to move that into java_start as well, but we can only do the delete for JavaThreads not for other threads. And we'd also have to change the VMThread and WatcherThread termination logic because of the deletes that happen in the termination thread - the "this" pointer (thread above) may no longer be valid when we want to call clear_current_thread() - which is why we can only do the ThreadLocalStorage::set_thread(NULL).

I agree it would be a lot cleaner to have java_start do:

thread->common_initialization(); thread->run(); thread->common_cleanup(); delete thread;

for all threads, but we'd need a lot of other changes to allow for that. Otherwise we would need to note that kind of thread before calling run() then switch on the thread type after run() to decide what kind of cleanup is necessary and possible. I don't think that would be better than just doing the "right" cleanup at the end of the run() methods.

I understand that this is a bit messy, and I won't insist that we change this in this RFR, but without looking at this in much detail it sounds weird to delete the thread in run(). Couldn't this be solved by introducing a virtual Thread::post_run() function and do: virtual void Thread::post_run() { clear_thread_current(); } virtual void JavaThread::post_run() { Thread::post_run(); delete this; } Thanks, StefanK

...

Thanks, David ------

...
log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", pthread id: " UINTX_FORMAT ").", os::current_thread_id(), (uintx) pthread_self());

return 0; }

And get rid of the explicit calls to ThreadLocalStorage::set_thread(NULL) you added?

Thanks, StefanK

On 04/05/16 01:39, David Holmes wrote:

...
This needs attention from GC and runtime folk please.

bug: https://bugs.openjdk.java.net/browse/JDK-8154715 webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/

tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called before a thread terminates.

Background:

Most system-related threads do not expect to explicitly terminate, except sometimes as part of VM termination. Such threads don't have their destructors called, but should.

This omission came to light due to the ThreadLocalStorage changes in JDK-8132510. As part of that change we deleted the following from the termination path of the VMThread:

// Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL);

The clearing of TLS seemed irrelevant to the VMThread as it primarily is used to aid in JNI attach/detach. However Brian Gardner reported:

http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm...

a problem on FreeBSD caused by this change and the interaction with the POSIX pthread TLS destructor use introduced by JDK-8033696. Because the VMThread terminated without clearing TLS, when the TLS-destructor was called it got into a loop which ran four times (as happens on Linux) and then prints a warning to the console (which doesn't happen on Linux).

This indicates we need to restore the:

ThreadLocalStorage::set_thread(NULL);

but on further consideration it seems to me that this is not confined to the VMThread, and the most appropriate fix would be to always invoke the Thread destructor as a thread terminates.

Solution:

Further investigation shows that calling the Thread destructor in the thread as it terminates is not possible:

- VMThread

This is actually destroyed by the thread that terminates the VM, but that can happen after it terminates and so we still hit the TLS problem. The VMThread may be able to destroy itself today but in the past this was not possible (see existing code comment), and in the future it may also not be possible - the problem is that the Thread destructor can interact with other VM subsystems that are concurrently being torn down by the thread that is terminating the VM. In the past this was the CodeHeap. So rather than introduce something that is fragile we stick with the current scheme but restore the ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at that time because it may already have been de-allocated.

- WatcherThread

The WatcherThread is never destroyed today but has the same problem as the VMThread. We can call the destructor from the VM termination thread (and have implemented that), but not from the WatcherThread itself. So again we just have to restore the ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem.

- GC Threads

There are two cases:

a) GC threads that never terminate

For these we don't need to do anything: we can't delete the thread as it never terminates and we don't hit the TLS problem because it never terminates. So all we will do here is add some logic to check (in NON_PRODUCT) that we do in fact never terminate.

b) GC threads that can terminate

Despite the fact the threads can terminate, references to those threads are stored elsewhere (WorkGangs and other places) and are not cleared as part of the termination process. Those references can be touched after the thread has terminated so we can not call the destructor at all. So again all we can do (without some major thread management reworking) is ensure that ThreadLocalStorage::set_thread(NULL); is called before the thread actually terminates

Testing: JPRT RBT - runtime nightly tests

Thanks, David

David Holmes

2:32 p.m.

On 7/05/2016 12:04 AM, Stefan Karlsson wrote:

...

Hi David,

On 06/05/16 15:38, David Holmes wrote:

...
Hi Stefan,

Thanks for taking a look at this.

On 6/05/2016 5:02 PM, Stefan Karlsson wrote:

...
Hi David,

I looked through the GC part of this webrev and I think the change is fine.

However, it seems a bit error prone. If we decide to change the code to, for example, terminate the AbstractGangWorker threads, then we have to remember to insert a ThreadLocalStorage::set_thread(NULL) call.

That's why I added the ShouldNotReachHere()'s - if those threads start terminating then we will see those hit. Perhaps a comment:

ShouldNotReachHere(); // If thread terminates we have to do TLS cleanup

?

Yes, I would appreciate a comment. Though, when we add new threads, we need to remember to add the set_thread(NULL) call.

Well no, what you would do is manage your new threads in such a way that their run() method can do "delete this" as the last call. Only if you can't do that do you need to think about what termination logic is missing that needs to be done in lieu of the destructor.

...

...
...
Could we instead add a call to ThreadLocalStorage::set_thread(NULL), or maybe even Thread::clear_thread_current(), in java_start?

static void *java_start(Thread *thread) { [...] thread->initialize_thread_current();

[...]

// call one more level start routine thread->run();

////////// Could we call Thread::clear_thread_current(); here?

Not easily. For JavaThreads we've already done "delete this" inside the run() method, so we'd have to move that into java_start as well, but we can only do the delete for JavaThreads not for other threads. And we'd also have to change the VMThread and WatcherThread termination logic because of the deletes that happen in the termination thread - the "this" pointer (thread above) may no longer be valid when we want to call clear_current_thread() - which is why we can only do the ThreadLocalStorage::set_thread(NULL).

I agree it would be a lot cleaner to have java_start do:

thread->common_initialization(); thread->run(); thread->common_cleanup(); delete thread;

for all threads, but we'd need a lot of other changes to allow for that. Otherwise we would need to note that kind of thread before calling run() then switch on the thread type after run() to decide what kind of cleanup is necessary and possible. I don't think that would be better than just doing the "right" cleanup at the end of the run() methods.

I understand that this is a bit messy, and I won't insist that we change this in this RFR, but without looking at this in much detail it sounds weird to delete the thread in run(). Couldn't this be solved by introducing a virtual Thread::post_run() function and do:

virtual void Thread::post_run() { clear_thread_current(); }

virtual void JavaThread::post_run() { Thread::post_run(); delete this; }

But again this can't work for the VMThread or WatcherThread as they are deleted from the termination thread and so thread->post_run() may SEGV.** Plus it is only after the fact that you realize not to put "delete this" in Thread::post_run(). ** Arguably the best solution to the "thread termination races with VM termination" problem is to not let the threads terminate. The code as it exists today can still have JavaThreads destroying themselves at the same that the VM is terminating and potentially hit the same errors that require us to not allow the VMThread (and now WatcherThread) to delete themselves. Thanks, David

...

Thanks, StefanK

...
Thanks, David ------

...
log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", pthread id: " UINTX_FORMAT ").", os::current_thread_id(), (uintx) pthread_self());

return 0; }

And get rid of the explicit calls to ThreadLocalStorage::set_thread(NULL) you added?

Thanks, StefanK

On 04/05/16 01:39, David Holmes wrote:

...
This needs attention from GC and runtime folk please.

bug: https://bugs.openjdk.java.net/browse/JDK-8154715 webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/

tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called before a thread terminates.

Background:

Most system-related threads do not expect to explicitly terminate, except sometimes as part of VM termination. Such threads don't have their destructors called, but should.

This omission came to light due to the ThreadLocalStorage changes in JDK-8132510. As part of that change we deleted the following from the termination path of the VMThread:

// Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL);

The clearing of TLS seemed irrelevant to the VMThread as it primarily is used to aid in JNI attach/detach. However Brian Gardner reported:

http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm...

a problem on FreeBSD caused by this change and the interaction with the POSIX pthread TLS destructor use introduced by JDK-8033696. Because the VMThread terminated without clearing TLS, when the TLS-destructor was called it got into a loop which ran four times (as happens on Linux) and then prints a warning to the console (which doesn't happen on Linux).

This indicates we need to restore the:

ThreadLocalStorage::set_thread(NULL);

but on further consideration it seems to me that this is not confined to the VMThread, and the most appropriate fix would be to always invoke the Thread destructor as a thread terminates.

Solution:

Further investigation shows that calling the Thread destructor in the thread as it terminates is not possible:

- VMThread

This is actually destroyed by the thread that terminates the VM, but that can happen after it terminates and so we still hit the TLS problem. The VMThread may be able to destroy itself today but in the past this was not possible (see existing code comment), and in the future it may also not be possible - the problem is that the Thread destructor can interact with other VM subsystems that are concurrently being torn down by the thread that is terminating the VM. In the past this was the CodeHeap. So rather than introduce something that is fragile we stick with the current scheme but restore the ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at that time because it may already have been de-allocated.

- WatcherThread

The WatcherThread is never destroyed today but has the same problem as the VMThread. We can call the destructor from the VM termination thread (and have implemented that), but not from the WatcherThread itself. So again we just have to restore the ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem.

- GC Threads

There are two cases:

a) GC threads that never terminate

For these we don't need to do anything: we can't delete the thread as it never terminates and we don't hit the TLS problem because it never terminates. So all we will do here is add some logic to check (in NON_PRODUCT) that we do in fact never terminate.

b) GC threads that can terminate

Despite the fact the threads can terminate, references to those threads are stored elsewhere (WorkGangs and other places) and are not cleared as part of the termination process. Those references can be touched after the thread has terminated so we can not call the destructor at all. So again all we can do (without some major thread management reworking) is ensure that ThreadLocalStorage::set_thread(NULL); is called before the thread actually terminates

Testing: JPRT RBT - runtime nightly tests

Thanks, David

Stefan Karlsson

3:41 p.m.

On 06/05/16 16:32, David Holmes wrote:

...

On 7/05/2016 12:04 AM, Stefan Karlsson wrote:

...
Hi David,

On 06/05/16 15:38, David Holmes wrote:

...
Hi Stefan,

Thanks for taking a look at this.

On 6/05/2016 5:02 PM, Stefan Karlsson wrote:

...
Hi David,

I looked through the GC part of this webrev and I think the change is fine.

However, it seems a bit error prone. If we decide to change the code to, for example, terminate the AbstractGangWorker threads, then we have to remember to insert a ThreadLocalStorage::set_thread(NULL) call.

That's why I added the ShouldNotReachHere()'s - if those threads start terminating then we will see those hit. Perhaps a comment:

ShouldNotReachHere(); // If thread terminates we have to do TLS cleanup

?

Yes, I would appreciate a comment. Though, when we add new threads, we need to remember to add the set_thread(NULL) call.

Well no, what you would do is manage your new threads in such a way that their run() method can do "delete this" as the last call. Only if you can't do that do you need to think about what termination logic is missing that needs to be done in lieu of the destructor.

Yes, but this forces every implementer of a Thread:run() function to have to think about these kind of requirements.

...

...
...
...
Could we instead add a call to ThreadLocalStorage::set_thread(NULL), or maybe even Thread::clear_thread_current(), in java_start?

static void *java_start(Thread *thread) { [...] thread->initialize_thread_current();

[...]

// call one more level start routine thread->run();

////////// Could we call Thread::clear_thread_current(); here?

Not easily. For JavaThreads we've already done "delete this" inside the run() method, so we'd have to move that into java_start as well, but we can only do the delete for JavaThreads not for other threads. And we'd also have to change the VMThread and WatcherThread termination logic because of the deletes that happen in the termination thread - the "this" pointer (thread above) may no longer be valid when we want to call clear_current_thread() - which is why we can only do the ThreadLocalStorage::set_thread(NULL).

I agree it would be a lot cleaner to have java_start do:

thread->common_initialization(); thread->run(); thread->common_cleanup(); delete thread;

for all threads, but we'd need a lot of other changes to allow for that. Otherwise we would need to note that kind of thread before calling run() then switch on the thread type after run() to decide what kind of cleanup is necessary and possible. I don't think that would be better than just doing the "right" cleanup at the end of the run() methods.

I understand that this is a bit messy, and I won't insist that we change this in this RFR, but without looking at this in much detail it sounds weird to delete the thread in run(). Couldn't this be solved by introducing a virtual Thread::post_run() function and do:

virtual void Thread::post_run() { clear_thread_current(); }

virtual void JavaThread::post_run() { Thread::post_run(); delete this; }

But again this can't work for the VMThread or WatcherThread as they are deleted from the termination thread and so thread->post_run() may SEGV.** Plus it is only after the fact that you realize not to put "delete this" in Thread::post_run().

OK, I didn't understand what you meant with "termination thread", but I now see the call to VMThread::destroy(). With that said, I find it odd that VMThread::destroy() deletes the VM thread. We already handshake between the VMThread and the "termination thread", so why isn't that VMThread::post_run() implemented as: virtual void VMThread::post_run() { // signal other threads that VM process is gone { // Note: we must have the _no_safepoint_check_flag. Mutex::lock() allows // VM thread to enter any lock at Safepoint as long as its _owner is NULL. // If that happens after _terminate_lock->wait() has unset _owner // but before it actually drops the lock and waits, the notification below // may get lost and we will have a hang. To avoid this, we need to use // Mutex::lock_without_safepoint_check(). MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag); _terminated = true; _terminate_lock->notify(); } Thread::post_run(); delete this; } And then we wouldn't get a SEGV ... I couldn't find the destructor for the WatchThread, but it seems easy to fix that as well. I'm probably missing something, but I find it a bit annoying that code that should belong to the *Thread:ing system leaks into the implementations of *Thread::run(). Thanks, StefanK

...

** Arguably the best solution to the "thread termination races with VM termination" problem is to not let the threads terminate. The code as it exists today can still have JavaThreads destroying themselves at the same that the VM is terminating and potentially hit the same errors that require us to not allow the VMThread (and now WatcherThread) to delete themselves.

Thanks, David

...
Thanks, StefanK

...
Thanks, David ------

...
log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", pthread id: " UINTX_FORMAT ").", os::current_thread_id(), (uintx) pthread_self());

return 0; }

And get rid of the explicit calls to ThreadLocalStorage::set_thread(NULL) you added?

Thanks, StefanK

On 04/05/16 01:39, David Holmes wrote:

...
This needs attention from GC and runtime folk please.

bug: https://bugs.openjdk.java.net/browse/JDK-8154715 webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/

tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called before a thread terminates.

Background:

Most system-related threads do not expect to explicitly terminate, except sometimes as part of VM termination. Such threads don't have their destructors called, but should.

This omission came to light due to the ThreadLocalStorage changes in JDK-8132510. As part of that change we deleted the following from the termination path of the VMThread:

// Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL);

The clearing of TLS seemed irrelevant to the VMThread as it primarily is used to aid in JNI attach/detach. However Brian Gardner reported:

http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm...

a problem on FreeBSD caused by this change and the interaction with the POSIX pthread TLS destructor use introduced by JDK-8033696. Because the VMThread terminated without clearing TLS, when the TLS-destructor was called it got into a loop which ran four times (as happens on Linux) and then prints a warning to the console (which doesn't happen on Linux).

This indicates we need to restore the:

ThreadLocalStorage::set_thread(NULL);

but on further consideration it seems to me that this is not confined to the VMThread, and the most appropriate fix would be to always invoke the Thread destructor as a thread terminates.

Solution:

Further investigation shows that calling the Thread destructor in the thread as it terminates is not possible:

- VMThread

This is actually destroyed by the thread that terminates the VM, but that can happen after it terminates and so we still hit the TLS problem. The VMThread may be able to destroy itself today but in the past this was not possible (see existing code comment), and in the future it may also not be possible - the problem is that the Thread destructor can interact with other VM subsystems that are concurrently being torn down by the thread that is terminating the VM. In the past this was the CodeHeap. So rather than introduce something that is fragile we stick with the current scheme but restore the ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at that time because it may already have been de-allocated.

- WatcherThread

The WatcherThread is never destroyed today but has the same problem as the VMThread. We can call the destructor from the VM termination thread (and have implemented that), but not from the WatcherThread itself. So again we just have to restore the ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem.

- GC Threads

There are two cases:

a) GC threads that never terminate

For these we don't need to do anything: we can't delete the thread as it never terminates and we don't hit the TLS problem because it never terminates. So all we will do here is add some logic to check (in NON_PRODUCT) that we do in fact never terminate.

b) GC threads that can terminate

Despite the fact the threads can terminate, references to those threads are stored elsewhere (WorkGangs and other places) and are not cleared as part of the termination process. Those references can be touched after the thread has terminated so we can not call the destructor at all. So again all we can do (without some major thread management reworking) is ensure that ThreadLocalStorage::set_thread(NULL); is called before the thread actually terminates

Testing: JPRT RBT - runtime nightly tests

Thanks, David

Brian Gardner

7:40 p.m.

I agree with Stefan. When I initially ran into the problem I came up with the following changeset that solves my problem, by calling clear_thread_current at the end of java_start. http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web... <http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/webrev/> Brian

...

On May 6, 2016, at 8:41 AM, Stefan Karlsson <stefan.karlsson@oracle.com> wrote:

On 06/05/16 16:32, David Holmes wrote:

...
On 7/05/2016 12:04 AM, Stefan Karlsson wrote:

...
Hi David,

On 06/05/16 15:38, David Holmes wrote:

...
Hi Stefan,

Thanks for taking a look at this.

On 6/05/2016 5:02 PM, Stefan Karlsson wrote:

...
Hi David,

I looked through the GC part of this webrev and I think the change is fine.

However, it seems a bit error prone. If we decide to change the code to, for example, terminate the AbstractGangWorker threads, then we have to remember to insert a ThreadLocalStorage::set_thread(NULL) call.

That's why I added the ShouldNotReachHere()'s - if those threads start terminating then we will see those hit. Perhaps a comment:

ShouldNotReachHere(); // If thread terminates we have to do TLS cleanup

?

Yes, I would appreciate a comment. Though, when we add new threads, we need to remember to add the set_thread(NULL) call.

Well no, what you would do is manage your new threads in such a way that their run() method can do "delete this" as the last call. Only if you can't do that do you need to think about what termination logic is missing that needs to be done in lieu of the destructor.

Yes, but this forces every implementer of a Thread:run() function to have to think about these kind of requirements.

...
...
...
...
Could we instead add a call to ThreadLocalStorage::set_thread(NULL), or maybe even Thread::clear_thread_current(), in java_start?

static void *java_start(Thread *thread) { [...] thread->initialize_thread_current();

[...]

// call one more level start routine thread->run();

////////// Could we call Thread::clear_thread_current(); here?

Not easily. For JavaThreads we've already done "delete this" inside the run() method, so we'd have to move that into java_start as well, but we can only do the delete for JavaThreads not for other threads. And we'd also have to change the VMThread and WatcherThread termination logic because of the deletes that happen in the termination thread - the "this" pointer (thread above) may no longer be valid when we want to call clear_current_thread() - which is why we can only do the ThreadLocalStorage::set_thread(NULL).

I agree it would be a lot cleaner to have java_start do:

thread->common_initialization(); thread->run(); thread->common_cleanup(); delete thread;

for all threads, but we'd need a lot of other changes to allow for that. Otherwise we would need to note that kind of thread before calling run() then switch on the thread type after run() to decide what kind of cleanup is necessary and possible. I don't think that would be better than just doing the "right" cleanup at the end of the run() methods.

I understand that this is a bit messy, and I won't insist that we change this in this RFR, but without looking at this in much detail it sounds weird to delete the thread in run(). Couldn't this be solved by introducing a virtual Thread::post_run() function and do:

virtual void Thread::post_run() { clear_thread_current(); }

virtual void JavaThread::post_run() { Thread::post_run(); delete this; }

But again this can't work for the VMThread or WatcherThread as they are deleted from the termination thread and so thread->post_run() may SEGV.** Plus it is only after the fact that you realize not to put "delete this" in Thread::post_run().

OK, I didn't understand what you meant with "termination thread", but I now see the call to VMThread::destroy().

With that said, I find it odd that VMThread::destroy() deletes the VM thread. We already handshake between the VMThread and the "termination thread", so why isn't that VMThread::post_run() implemented as:

virtual void VMThread::post_run() { // signal other threads that VM process is gone { // Note: we must have the _no_safepoint_check_flag. Mutex::lock() allows // VM thread to enter any lock at Safepoint as long as its _owner is NULL. // If that happens after _terminate_lock->wait() has unset _owner // but before it actually drops the lock and waits, the notification below // may get lost and we will have a hang. To avoid this, we need to use // Mutex::lock_without_safepoint_check(). MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag); _terminated = true; _terminate_lock->notify(); }

Thread::post_run(); delete this; }

And then we wouldn't get a SEGV ...

I couldn't find the destructor for the WatchThread, but it seems easy to fix that as well.

I'm probably missing something, but I find it a bit annoying that code that should belong to the *Thread:ing system leaks into the implementations of *Thread::run().

Thanks, StefanK

...
** Arguably the best solution to the "thread termination races with VM termination" problem is to not let the threads terminate. The code as it exists today can still have JavaThreads destroying themselves at the same that the VM is terminating and potentially hit the same errors that require us to not allow the VMThread (and now WatcherThread) to delete themselves.

Thanks, David

...
Thanks, StefanK

...
Thanks, David ------

...
log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", pthread id: " UINTX_FORMAT ").", os::current_thread_id(), (uintx) pthread_self());

return 0; }

And get rid of the explicit calls to ThreadLocalStorage::set_thread(NULL) you added?

Thanks, StefanK

On 04/05/16 01:39, David Holmes wrote:

...
This needs attention from GC and runtime folk please.

bug: https://bugs.openjdk.java.net/browse/JDK-8154715 webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/

tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called before a thread terminates.

Background:

Most system-related threads do not expect to explicitly terminate, except sometimes as part of VM termination. Such threads don't have their destructors called, but should.

This omission came to light due to the ThreadLocalStorage changes in JDK-8132510. As part of that change we deleted the following from the termination path of the VMThread:

// Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL);

The clearing of TLS seemed irrelevant to the VMThread as it primarily is used to aid in JNI attach/detach. However Brian Gardner reported:

http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm...

a problem on FreeBSD caused by this change and the interaction with the POSIX pthread TLS destructor use introduced by JDK-8033696. Because the VMThread terminated without clearing TLS, when the TLS-destructor was called it got into a loop which ran four times (as happens on Linux) and then prints a warning to the console (which doesn't happen on Linux).

This indicates we need to restore the:

ThreadLocalStorage::set_thread(NULL);

but on further consideration it seems to me that this is not confined to the VMThread, and the most appropriate fix would be to always invoke the Thread destructor as a thread terminates.

Solution:

Further investigation shows that calling the Thread destructor in the thread as it terminates is not possible:

- VMThread

This is actually destroyed by the thread that terminates the VM, but that can happen after it terminates and so we still hit the TLS problem. The VMThread may be able to destroy itself today but in the past this was not possible (see existing code comment), and in the future it may also not be possible - the problem is that the Thread destructor can interact with other VM subsystems that are concurrently being torn down by the thread that is terminating the VM. In the past this was the CodeHeap. So rather than introduce something that is fragile we stick with the current scheme but restore the ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at that time because it may already have been de-allocated.

- WatcherThread

The WatcherThread is never destroyed today but has the same problem as the VMThread. We can call the destructor from the VM termination thread (and have implemented that), but not from the WatcherThread itself. So again we just have to restore the ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem.

- GC Threads

There are two cases:

a) GC threads that never terminate

For these we don't need to do anything: we can't delete the thread as it never terminates and we don't hit the TLS problem because it never terminates. So all we will do here is add some logic to check (in NON_PRODUCT) that we do in fact never terminate.

b) GC threads that can terminate

Despite the fact the threads can terminate, references to those threads are stored elsewhere (WorkGangs and other places) and are not cleared as part of the termination process. Those references can be touched after the thread has terminated so we can not call the destructor at all. So again all we can do (without some major thread management reworking) is ensure that ThreadLocalStorage::set_thread(NULL); is called before the thread actually terminates

Testing: JPRT RBT - runtime nightly tests

Thanks, David

David Holmes

11:50 p.m.

On 7/05/2016 5:40 AM, Brian Gardner wrote:

...

I agree with Stefan. When I initially ran into the problem I came up with the following changeset that solves my problem, by calling clear_thread_current at the end of java_start.

http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web...

As I said I can see the appeal in doing this, but there is still a race for the threads destroyed at VM shutdown as current_or_null() can return non-NULL but the delete can happen before we call thread->clear_current_thread(). You would also want Thread::current_or_null_safe() I think, due to already deleted JavaThreads. And this all side-steps the real issue in my opinion that with clean thread lifecycle management we should always be able to delete the terminating thread. So to me it is a question of selecting which set of bandaids you want to apply. Thanks, David

...

Brian

...
On May 6, 2016, at 8:41 AM, Stefan Karlsson <stefan.karlsson@oracle.com <mailto:stefan.karlsson@oracle.com>> wrote:

On 06/05/16 16:32, David Holmes wrote:

...
On 7/05/2016 12:04 AM, Stefan Karlsson wrote:

...
Hi David,

On 06/05/16 15:38, David Holmes wrote:

...
Hi Stefan,

Thanks for taking a look at this.

On 6/05/2016 5:02 PM, Stefan Karlsson wrote:

...
Hi David,

I looked through the GC part of this webrev and I think the change is fine.

However, it seems a bit error prone. If we decide to change the code to, for example, terminate the AbstractGangWorker threads, then we have to remember to insert a ThreadLocalStorage::set_thread(NULL) call.

That's why I added the ShouldNotReachHere()'s - if those threads start terminating then we will see those hit. Perhaps a comment:

ShouldNotReachHere(); // If thread terminates we have to do TLS cleanup

?

Yes, I would appreciate a comment. Though, when we add new threads, we need to remember to add the set_thread(NULL) call.

Well no, what you would do is manage your new threads in such a way that their run() method can do "delete this" as the last call. Only if you can't do that do you need to think about what termination logic is missing that needs to be done in lieu of the destructor.

Yes, but this forces every implementer of a Thread:run() function to have to think about these kind of requirements.

...
...
...
...
Could we instead add a call to ThreadLocalStorage::set_thread(NULL), or maybe even Thread::clear_thread_current(), in java_start?

static void *java_start(Thread *thread) { [...] thread->initialize_thread_current();

[...]

// call one more level start routine thread->run();

////////// Could we call Thread::clear_thread_current(); here?

Not easily. For JavaThreads we've already done "delete this" inside the run() method, so we'd have to move that into java_start as well, but we can only do the delete for JavaThreads not for other threads. And we'd also have to change the VMThread and WatcherThread termination logic because of the deletes that happen in the termination thread - the "this" pointer (thread above) may no longer be valid when we want to call clear_current_thread() - which is why we can only do the ThreadLocalStorage::set_thread(NULL).

I agree it would be a lot cleaner to have java_start do:

thread->common_initialization(); thread->run(); thread->common_cleanup(); delete thread;

for all threads, but we'd need a lot of other changes to allow for that. Otherwise we would need to note that kind of thread before calling run() then switch on the thread type after run() to decide what kind of cleanup is necessary and possible. I don't think that would be better than just doing the "right" cleanup at the end of the run() methods.

I understand that this is a bit messy, and I won't insist that we change this in this RFR, but without looking at this in much detail it sounds weird to delete the thread in run(). Couldn't this be solved by introducing a virtual Thread::post_run() function and do:

virtual void Thread::post_run() { clear_thread_current(); }

virtual void JavaThread::post_run() { Thread::post_run(); delete this; }

But again this can't work for the VMThread or WatcherThread as they are deleted from the termination thread and so thread->post_run() may SEGV.** Plus it is only after the fact that you realize not to put "delete this" in Thread::post_run().

OK, I didn't understand what you meant with "termination thread", but I now see the call to VMThread::destroy().

With that said, I find it odd that VMThread::destroy() deletes the VM thread. We already handshake between the VMThread and the "termination thread", so why isn't that VMThread::post_run() implemented as:

virtual void VMThread::post_run() { // signal other threads that VM process is gone { // Note: we must have the _no_safepoint_check_flag. Mutex::lock() allows // VM thread to enter any lock at Safepoint as long as its _owner is NULL. // If that happens after _terminate_lock->wait() has unset _owner // but before it actually drops the lock and waits, the notification below // may get lost and we will have a hang. To avoid this, we need to use // Mutex::lock_without_safepoint_check(). MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag); _terminated = true; _terminate_lock->notify(); }

Thread::post_run(); delete this; }

And then we wouldn't get a SEGV ...

I couldn't find the destructor for the WatchThread, but it seems easy to fix that as well.

I'm probably missing something, but I find it a bit annoying that code that should belong to the *Thread:ing system leaks into the implementations of *Thread::run().

Thanks, StefanK

...
** Arguably the best solution to the "thread termination races with VM termination" problem is to not let the threads terminate. The code as it exists today can still have JavaThreads destroying themselves at the same that the VM is terminating and potentially hit the same errors that require us to not allow the VMThread (and now WatcherThread) to delete themselves.

Thanks, David

...
Thanks, StefanK

...
Thanks, David ------

...
log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", pthread id: " UINTX_FORMAT ").", os::current_thread_id(), (uintx) pthread_self());

return 0; }

And get rid of the explicit calls to ThreadLocalStorage::set_thread(NULL) you added?

Thanks, StefanK

On 04/05/16 01:39, David Holmes wrote: > This needs attention from GC and runtime folk please. > > bug: https://bugs.openjdk.java.net/browse/JDK-8154715 > webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/ > > tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called > before a thread terminates. > > Background: > > Most system-related threads do not expect to explicitly terminate, > except sometimes as part of VM termination. Such threads don't have > their destructors called, but should. > > This omission came to light due to the ThreadLocalStorage changes in > JDK-8132510. As part of that change we deleted the following from the > termination path of the VMThread: > > // Thread destructor usually does this. > ThreadLocalStorage::set_thread(NULL); > > The clearing of TLS seemed irrelevant to the VMThread as it primarily > is used to aid in JNI attach/detach. However Brian Gardner reported: > > http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm... > > > > a problem on FreeBSD caused by this change and the interaction with > the POSIX pthread TLS destructor use introduced by JDK-8033696. > Because the VMThread terminated without clearing TLS, when the > TLS-destructor was called it got into a loop which ran four times (as > happens on Linux) and then prints a warning to the console (which > doesn't happen on Linux). > > This indicates we need to restore the: > > ThreadLocalStorage::set_thread(NULL); > > but on further consideration it seems to me that this is not confined > to the VMThread, and the most appropriate fix would be to always > invoke the Thread destructor as a thread terminates. > > Solution: > > Further investigation shows that calling the Thread destructor in the > thread as it terminates is not possible: > > - VMThread > > This is actually destroyed by the thread that terminates the VM, but > that can happen after it terminates and so we still hit the TLS > problem. The VMThread may be able to destroy itself today but in the > past this was not possible (see existing code comment), and in the > future it may also not be possible - the problem is that the Thread > destructor can interact with other VM subsystems that are > concurrently > being torn down by the thread that is terminating the VM. In the past > this was the CodeHeap. So rather than introduce something that is > fragile we stick with the current scheme but restore the > ThreadLocalStorage::set_thread(NULL); - note we can't access > "this" at > that time because it may already have been de-allocated. > > - WatcherThread > > The WatcherThread is never destroyed today but has the same > problem as > the VMThread. We can call the destructor from the VM termination > thread (and have implemented that), but not from the WatcherThread > itself. So again we just have to restore the > ThreadLocalStorage::set_thread(NULL); to fix the potential TLS > problem. > > - GC Threads > > There are two cases: > > a) GC threads that never terminate > > For these we don't need to do anything: we can't delete the thread as > it never terminates and we don't hit the TLS problem because it never > terminates. So all we will do here is add some logic to check (in > NON_PRODUCT) that we do in fact never terminate. > > b) GC threads that can terminate > > Despite the fact the threads can terminate, references to those > threads are stored elsewhere (WorkGangs and other places) and are not > cleared as part of the termination process. Those references can be > touched after the thread has terminated so we can not call the > destructor at all. So again all we can do (without some major thread > management reworking) is ensure that > ThreadLocalStorage::set_thread(NULL); is called before the thread > actually terminates > > Testing: JPRT > RBT - runtime nightly tests > > Thanks, > David

David Holmes

7 May 7 May

5:45 a.m.

Brian, Stefan, I've been giving this more thought and I think you are both right. While this can't be completely handled in java_start for all threads it can handle the most common cases. If I don't add the deletion of the WatcherThread then the problematic case reduces to that of the VMThread. I'll do a new version on Monday. I also need to double-check those threads that don't use java_start (which I am sorely tempted to rename). Thanks, David On 7/05/2016 9:50 AM, David Holmes wrote:

...

On 7/05/2016 5:40 AM, Brian Gardner wrote:

...
I agree with Stefan. When I initially ran into the problem I came up with the following changeset that solves my problem, by calling clear_thread_current at the end of java_start.

http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web...

As I said I can see the appeal in doing this, but there is still a race for the threads destroyed at VM shutdown as current_or_null() can return non-NULL but the delete can happen before we call thread->clear_current_thread().

You would also want Thread::current_or_null_safe() I think, due to already deleted JavaThreads.

And this all side-steps the real issue in my opinion that with clean thread lifecycle management we should always be able to delete the terminating thread.

So to me it is a question of selecting which set of bandaids you want to apply.

Thanks, David

...
Brian

...
On May 6, 2016, at 8:41 AM, Stefan Karlsson <stefan.karlsson@oracle.com <mailto:stefan.karlsson@oracle.com>> wrote:

On 06/05/16 16:32, David Holmes wrote:

...
On 7/05/2016 12:04 AM, Stefan Karlsson wrote:

...
Hi David,

On 06/05/16 15:38, David Holmes wrote:

...
Hi Stefan,

Thanks for taking a look at this.

On 6/05/2016 5:02 PM, Stefan Karlsson wrote: > Hi David, > > I looked through the GC part of this webrev and I think the > change is > fine. > > However, it seems a bit error prone. If we decide to change the > code to, > for example, terminate the AbstractGangWorker threads, then we > have to > remember to insert a ThreadLocalStorage::set_thread(NULL) call.

That's why I added the ShouldNotReachHere()'s - if those threads start terminating then we will see those hit. Perhaps a comment:

ShouldNotReachHere(); // If thread terminates we have to do TLS cleanup

?

Yes, I would appreciate a comment. Though, when we add new threads, we need to remember to add the set_thread(NULL) call.

Well no, what you would do is manage your new threads in such a way that their run() method can do "delete this" as the last call. Only if you can't do that do you need to think about what termination logic is missing that needs to be done in lieu of the destructor.

Yes, but this forces every implementer of a Thread:run() function to have to think about these kind of requirements.

...
...
...
> Could we instead add a call to > ThreadLocalStorage::set_thread(NULL), or > maybe even Thread::clear_thread_current(), in java_start? > > static void *java_start(Thread *thread) { > [...] > thread->initialize_thread_current(); > > [...] > > // call one more level start routine > thread->run(); > > ////////// Could we call Thread::clear_thread_current(); here?

Not easily. For JavaThreads we've already done "delete this" inside the run() method, so we'd have to move that into java_start as well, but we can only do the delete for JavaThreads not for other threads. And we'd also have to change the VMThread and WatcherThread termination logic because of the deletes that happen in the termination thread - the "this" pointer (thread above) may no longer be valid when we want to call clear_current_thread() - which is why we can only do the ThreadLocalStorage::set_thread(NULL).

I agree it would be a lot cleaner to have java_start do:

thread->common_initialization(); thread->run(); thread->common_cleanup(); delete thread;

for all threads, but we'd need a lot of other changes to allow for that. Otherwise we would need to note that kind of thread before calling run() then switch on the thread type after run() to decide what kind of cleanup is necessary and possible. I don't think that would be better than just doing the "right" cleanup at the end of the run() methods.

I understand that this is a bit messy, and I won't insist that we change this in this RFR, but without looking at this in much detail it sounds weird to delete the thread in run(). Couldn't this be solved by introducing a virtual Thread::post_run() function and do:

virtual void Thread::post_run() { clear_thread_current(); }

virtual void JavaThread::post_run() { Thread::post_run(); delete this; }

But again this can't work for the VMThread or WatcherThread as they are deleted from the termination thread and so thread->post_run() may SEGV.** Plus it is only after the fact that you realize not to put "delete this" in Thread::post_run().

OK, I didn't understand what you meant with "termination thread", but I now see the call to VMThread::destroy().

With that said, I find it odd that VMThread::destroy() deletes the VM thread. We already handshake between the VMThread and the "termination thread", so why isn't that VMThread::post_run() implemented as:

virtual void VMThread::post_run() { // signal other threads that VM process is gone { // Note: we must have the _no_safepoint_check_flag. Mutex::lock() allows // VM thread to enter any lock at Safepoint as long as its _owner is NULL. // If that happens after _terminate_lock->wait() has unset _owner // but before it actually drops the lock and waits, the notification below // may get lost and we will have a hang. To avoid this, we need to use // Mutex::lock_without_safepoint_check(). MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag); _terminated = true; _terminate_lock->notify(); }

Thread::post_run(); delete this; }

And then we wouldn't get a SEGV ...

I couldn't find the destructor for the WatchThread, but it seems easy to fix that as well.

I'm probably missing something, but I find it a bit annoying that code that should belong to the *Thread:ing system leaks into the implementations of *Thread::run().

Thanks, StefanK

...
** Arguably the best solution to the "thread termination races with VM termination" problem is to not let the threads terminate. The code as it exists today can still have JavaThreads destroying themselves at the same that the VM is terminating and potentially hit the same errors that require us to not allow the VMThread (and now WatcherThread) to delete themselves.

Thanks, David

...
Thanks, StefanK

...
Thanks, David ------

> > log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", > pthread > id: " UINTX_FORMAT ").", > os::current_thread_id(), (uintx) pthread_self()); > > return 0; > } > > And get rid of the explicit calls to > ThreadLocalStorage::set_thread(NULL) you added? > > Thanks, > StefanK > > On 04/05/16 01:39, David Holmes wrote: >> This needs attention from GC and runtime folk please. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8154715 >> webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/ >> >> tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called >> before a thread terminates. >> >> Background: >> >> Most system-related threads do not expect to explicitly terminate, >> except sometimes as part of VM termination. Such threads don't have >> their destructors called, but should. >> >> This omission came to light due to the ThreadLocalStorage >> changes in >> JDK-8132510. As part of that change we deleted the following >> from the >> termination path of the VMThread: >> >> // Thread destructor usually does this. >> ThreadLocalStorage::set_thread(NULL); >> >> The clearing of TLS seemed irrelevant to the VMThread as it >> primarily >> is used to aid in JNI attach/detach. However Brian Gardner >> reported: >> >> http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm... >> >> >> >> >> a problem on FreeBSD caused by this change and the interaction with >> the POSIX pthread TLS destructor use introduced by JDK-8033696. >> Because the VMThread terminated without clearing TLS, when the >> TLS-destructor was called it got into a loop which ran four >> times (as >> happens on Linux) and then prints a warning to the console (which >> doesn't happen on Linux). >> >> This indicates we need to restore the: >> >> ThreadLocalStorage::set_thread(NULL); >> >> but on further consideration it seems to me that this is not >> confined >> to the VMThread, and the most appropriate fix would be to always >> invoke the Thread destructor as a thread terminates. >> >> Solution: >> >> Further investigation shows that calling the Thread destructor >> in the >> thread as it terminates is not possible: >> >> - VMThread >> >> This is actually destroyed by the thread that terminates the VM, >> but >> that can happen after it terminates and so we still hit the TLS >> problem. The VMThread may be able to destroy itself today but in >> the >> past this was not possible (see existing code comment), and in the >> future it may also not be possible - the problem is that the Thread >> destructor can interact with other VM subsystems that are >> concurrently >> being torn down by the thread that is terminating the VM. In the >> past >> this was the CodeHeap. So rather than introduce something that is >> fragile we stick with the current scheme but restore the >> ThreadLocalStorage::set_thread(NULL); - note we can't access >> "this" at >> that time because it may already have been de-allocated. >> >> - WatcherThread >> >> The WatcherThread is never destroyed today but has the same >> problem as >> the VMThread. We can call the destructor from the VM termination >> thread (and have implemented that), but not from the WatcherThread >> itself. So again we just have to restore the >> ThreadLocalStorage::set_thread(NULL); to fix the potential TLS >> problem. >> >> - GC Threads >> >> There are two cases: >> >> a) GC threads that never terminate >> >> For these we don't need to do anything: we can't delete the >> thread as >> it never terminates and we don't hit the TLS problem because it >> never >> terminates. So all we will do here is add some logic to check (in >> NON_PRODUCT) that we do in fact never terminate. >> >> b) GC threads that can terminate >> >> Despite the fact the threads can terminate, references to those >> threads are stored elsewhere (WorkGangs and other places) and >> are not >> cleared as part of the termination process. Those references can be >> touched after the thread has terminated so we can not call the >> destructor at all. So again all we can do (without some major >> thread >> management reworking) is ensure that >> ThreadLocalStorage::set_thread(NULL); is called before the thread >> actually terminates >> >> Testing: JPRT >> RBT - runtime nightly tests >> >> Thanks, >> David

Stefan Karlsson

9 May 9 May

9:29 a.m.

Hi, For java_start, would it be enough to just, unconditionally, call ThreadLocalStorage::set_thread(NULL) after thread->run() ? StefanK On 2016-05-07 07:45, David Holmes wrote:

...

Brian, Stefan,

I've been giving this more thought and I think you are both right. While this can't be completely handled in java_start for all threads it can handle the most common cases. If I don't add the deletion of the WatcherThread then the problematic case reduces to that of the VMThread.

I'll do a new version on Monday. I also need to double-check those threads that don't use java_start (which I am sorely tempted to rename).

Thanks, David

On 7/05/2016 9:50 AM, David Holmes wrote:

...
On 7/05/2016 5:40 AM, Brian Gardner wrote:

...
I agree with Stefan. When I initially ran into the problem I came up with the following changeset that solves my problem, by calling clear_thread_current at the end of java_start.

http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web...

As I said I can see the appeal in doing this, but there is still a race for the threads destroyed at VM shutdown as current_or_null() can return non-NULL but the delete can happen before we call thread->clear_current_thread().

You would also want Thread::current_or_null_safe() I think, due to already deleted JavaThreads.

And this all side-steps the real issue in my opinion that with clean thread lifecycle management we should always be able to delete the terminating thread.

So to me it is a question of selecting which set of bandaids you want to apply.

Thanks, David

...
Brian

...
On May 6, 2016, at 8:41 AM, Stefan Karlsson <stefan.karlsson@oracle.com <mailto:stefan.karlsson@oracle.com>> wrote:

On 06/05/16 16:32, David Holmes wrote:

...
On 7/05/2016 12:04 AM, Stefan Karlsson wrote:

...
Hi David,

On 06/05/16 15:38, David Holmes wrote: > Hi Stefan, > > Thanks for taking a look at this. > > On 6/05/2016 5:02 PM, Stefan Karlsson wrote: >> Hi David, >> >> I looked through the GC part of this webrev and I think the >> change is >> fine. >> >> However, it seems a bit error prone. If we decide to change the >> code to, >> for example, terminate the AbstractGangWorker threads, then we >> have to >> remember to insert a ThreadLocalStorage::set_thread(NULL) call. > > That's why I added the ShouldNotReachHere()'s - if those threads > start > terminating then we will see those hit. Perhaps a comment: > > ShouldNotReachHere(); // If thread terminates we have to do TLS > cleanup > > ?

Yes, I would appreciate a comment. Though, when we add new threads, we need to remember to add the set_thread(NULL) call.

Well no, what you would do is manage your new threads in such a way that their run() method can do "delete this" as the last call. Only if you can't do that do you need to think about what termination logic is missing that needs to be done in lieu of the destructor.

Yes, but this forces every implementer of a Thread:run() function to have to think about these kind of requirements.

...
...
> >> Could we instead add a call to >> ThreadLocalStorage::set_thread(NULL), or >> maybe even Thread::clear_thread_current(), in java_start? >> >> static void *java_start(Thread *thread) { >> [...] >> thread->initialize_thread_current(); >> >> [...] >> >> // call one more level start routine >> thread->run(); >> >> ////////// Could we call Thread::clear_thread_current(); here? > > Not easily. For JavaThreads we've already done "delete this" inside > the run() method, so we'd have to move that into java_start as > well, > but we can only do the delete for JavaThreads not for other > threads. > And we'd also have to change the VMThread and WatcherThread > termination logic because of the deletes that happen in the > termination thread - the "this" pointer (thread above) may no > longer > be valid when we want to call clear_current_thread() - which is > why we > can only do the ThreadLocalStorage::set_thread(NULL). > > I agree it would be a lot cleaner to have java_start do: > > thread->common_initialization(); > thread->run(); > thread->common_cleanup(); > delete thread; > > for all threads, but we'd need a lot of other changes to allow for > that. Otherwise we would need to note that kind of thread before > calling run() then switch on the thread type after run() to decide > what kind of cleanup is necessary and possible. I don't think that > would be better than just doing the "right" cleanup at the end > of the > run() methods.

I understand that this is a bit messy, and I won't insist that we change this in this RFR, but without looking at this in much detail it sounds weird to delete the thread in run(). Couldn't this be solved by introducing a virtual Thread::post_run() function and do:

virtual void Thread::post_run() { clear_thread_current(); }

virtual void JavaThread::post_run() { Thread::post_run(); delete this; }

But again this can't work for the VMThread or WatcherThread as they are deleted from the termination thread and so thread->post_run() may SEGV.** Plus it is only after the fact that you realize not to put "delete this" in Thread::post_run().

OK, I didn't understand what you meant with "termination thread", but I now see the call to VMThread::destroy().

With that said, I find it odd that VMThread::destroy() deletes the VM thread. We already handshake between the VMThread and the "termination thread", so why isn't that VMThread::post_run() implemented as:

virtual void VMThread::post_run() { // signal other threads that VM process is gone { // Note: we must have the _no_safepoint_check_flag. Mutex::lock() allows // VM thread to enter any lock at Safepoint as long as its _owner is NULL. // If that happens after _terminate_lock->wait() has unset _owner // but before it actually drops the lock and waits, the notification below // may get lost and we will have a hang. To avoid this, we need to use // Mutex::lock_without_safepoint_check(). MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag); _terminated = true; _terminate_lock->notify(); }

Thread::post_run(); delete this; }

And then we wouldn't get a SEGV ...

I couldn't find the destructor for the WatchThread, but it seems easy to fix that as well.

I'm probably missing something, but I find it a bit annoying that code that should belong to the *Thread:ing system leaks into the implementations of *Thread::run().

Thanks, StefanK

...
** Arguably the best solution to the "thread termination races with VM termination" problem is to not let the threads terminate. The code as it exists today can still have JavaThreads destroying themselves at the same that the VM is terminating and potentially hit the same errors that require us to not allow the VMThread (and now WatcherThread) to delete themselves.

Thanks, David

...
Thanks, StefanK

> > Thanks, > David > ------ > >> >> log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", >> pthread >> id: " UINTX_FORMAT ").", >> os::current_thread_id(), (uintx) pthread_self()); >> >> return 0; >> } >> >> And get rid of the explicit calls to >> ThreadLocalStorage::set_thread(NULL) you added? >> >> Thanks, >> StefanK >> >> On 04/05/16 01:39, David Holmes wrote: >>> This needs attention from GC and runtime folk please. >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8154715 >>> webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/ >>> >>> tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always >>> called >>> before a thread terminates. >>> >>> Background: >>> >>> Most system-related threads do not expect to explicitly >>> terminate, >>> except sometimes as part of VM termination. Such threads don't >>> have >>> their destructors called, but should. >>> >>> This omission came to light due to the ThreadLocalStorage >>> changes in >>> JDK-8132510. As part of that change we deleted the following >>> from the >>> termination path of the VMThread: >>> >>> // Thread destructor usually does this. >>> ThreadLocalStorage::set_thread(NULL); >>> >>> The clearing of TLS seemed irrelevant to the VMThread as it >>> primarily >>> is used to aid in JNI attach/detach. However Brian Gardner >>> reported: >>> >>> http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm... >>> >>> >>> >>> >>> >>> a problem on FreeBSD caused by this change and the interaction >>> with >>> the POSIX pthread TLS destructor use introduced by JDK-8033696. >>> Because the VMThread terminated without clearing TLS, when the >>> TLS-destructor was called it got into a loop which ran four >>> times (as >>> happens on Linux) and then prints a warning to the console (which >>> doesn't happen on Linux). >>> >>> This indicates we need to restore the: >>> >>> ThreadLocalStorage::set_thread(NULL); >>> >>> but on further consideration it seems to me that this is not >>> confined >>> to the VMThread, and the most appropriate fix would be to always >>> invoke the Thread destructor as a thread terminates. >>> >>> Solution: >>> >>> Further investigation shows that calling the Thread destructor >>> in the >>> thread as it terminates is not possible: >>> >>> - VMThread >>> >>> This is actually destroyed by the thread that terminates the VM, >>> but >>> that can happen after it terminates and so we still hit the TLS >>> problem. The VMThread may be able to destroy itself today but in >>> the >>> past this was not possible (see existing code comment), and in >>> the >>> future it may also not be possible - the problem is that the >>> Thread >>> destructor can interact with other VM subsystems that are >>> concurrently >>> being torn down by the thread that is terminating the VM. In the >>> past >>> this was the CodeHeap. So rather than introduce something that is >>> fragile we stick with the current scheme but restore the >>> ThreadLocalStorage::set_thread(NULL); - note we can't access >>> "this" at >>> that time because it may already have been de-allocated. >>> >>> - WatcherThread >>> >>> The WatcherThread is never destroyed today but has the same >>> problem as >>> the VMThread. We can call the destructor from the VM termination >>> thread (and have implemented that), but not from the >>> WatcherThread >>> itself. So again we just have to restore the >>> ThreadLocalStorage::set_thread(NULL); to fix the potential TLS >>> problem. >>> >>> - GC Threads >>> >>> There are two cases: >>> >>> a) GC threads that never terminate >>> >>> For these we don't need to do anything: we can't delete the >>> thread as >>> it never terminates and we don't hit the TLS problem because it >>> never >>> terminates. So all we will do here is add some logic to check (in >>> NON_PRODUCT) that we do in fact never terminate. >>> >>> b) GC threads that can terminate >>> >>> Despite the fact the threads can terminate, references to those >>> threads are stored elsewhere (WorkGangs and other places) and >>> are not >>> cleared as part of the termination process. Those references >>> can be >>> touched after the thread has terminated so we can not call the >>> destructor at all. So again all we can do (without some major >>> thread >>> management reworking) is ensure that >>> ThreadLocalStorage::set_thread(NULL); is called before the thread >>> actually terminates >>> >>> Testing: JPRT >>> RBT - runtime nightly tests >>> >>> Thanks, >>> David

David Holmes

11:03 a.m.

On 9/05/2016 7:29 PM, Stefan Karlsson wrote:

...

Hi,

For java_start, would it be enough to just, unconditionally, call ThreadLocalStorage::set_thread(NULL) after thread->run() ?

That would suffice for the current problem but seems a bit patchy to me - it's only a partial cleanup of TLS. What I am going to do is based On Brian's suggestion: if (Thread::current_or_null_safe() != NULL) { thread->clear_thread_current(); } with the addition that I not only undo the deletion of the WatcherThread, but I also remove the deletion of the VMThread. Was hoping to have a webrev ready but have run into a weird build problem. Thanks, David

...

StefanK

On 2016-05-07 07:45, David Holmes wrote:

...
Brian, Stefan,

I've been giving this more thought and I think you are both right. While this can't be completely handled in java_start for all threads it can handle the most common cases. If I don't add the deletion of the WatcherThread then the problematic case reduces to that of the VMThread.

I'll do a new version on Monday. I also need to double-check those threads that don't use java_start (which I am sorely tempted to rename).

Thanks, David

On 7/05/2016 9:50 AM, David Holmes wrote:

...
On 7/05/2016 5:40 AM, Brian Gardner wrote:

...
I agree with Stefan. When I initially ran into the problem I came up with the following changeset that solves my problem, by calling clear_thread_current at the end of java_start.

http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current_alt/web...

As I said I can see the appeal in doing this, but there is still a race for the threads destroyed at VM shutdown as current_or_null() can return non-NULL but the delete can happen before we call thread->clear_current_thread().

You would also want Thread::current_or_null_safe() I think, due to already deleted JavaThreads.

And this all side-steps the real issue in my opinion that with clean thread lifecycle management we should always be able to delete the terminating thread.

So to me it is a question of selecting which set of bandaids you want to apply.

Thanks, David

...
Brian

...
On May 6, 2016, at 8:41 AM, Stefan Karlsson <stefan.karlsson@oracle.com <mailto:stefan.karlsson@oracle.com>> wrote:

On 06/05/16 16:32, David Holmes wrote:

...
On 7/05/2016 12:04 AM, Stefan Karlsson wrote: > Hi David, > > On 06/05/16 15:38, David Holmes wrote: >> Hi Stefan, >> >> Thanks for taking a look at this. >> >> On 6/05/2016 5:02 PM, Stefan Karlsson wrote: >>> Hi David, >>> >>> I looked through the GC part of this webrev and I think the >>> change is >>> fine. >>> >>> However, it seems a bit error prone. If we decide to change the >>> code to, >>> for example, terminate the AbstractGangWorker threads, then we >>> have to >>> remember to insert a ThreadLocalStorage::set_thread(NULL) call. >> >> That's why I added the ShouldNotReachHere()'s - if those threads >> start >> terminating then we will see those hit. Perhaps a comment: >> >> ShouldNotReachHere(); // If thread terminates we have to do TLS >> cleanup >> >> ? > > Yes, I would appreciate a comment. Though, when we add new > threads, we > need to remember to add the set_thread(NULL) call.

Well no, what you would do is manage your new threads in such a way that their run() method can do "delete this" as the last call. Only if you can't do that do you need to think about what termination logic is missing that needs to be done in lieu of the destructor.

Yes, but this forces every implementer of a Thread:run() function to have to think about these kind of requirements.

...
>> >>> Could we instead add a call to >>> ThreadLocalStorage::set_thread(NULL), or >>> maybe even Thread::clear_thread_current(), in java_start? >>> >>> static void *java_start(Thread *thread) { >>> [...] >>> thread->initialize_thread_current(); >>> >>> [...] >>> >>> // call one more level start routine >>> thread->run(); >>> >>> ////////// Could we call Thread::clear_thread_current(); here? >> >> Not easily. For JavaThreads we've already done "delete this" inside >> the run() method, so we'd have to move that into java_start as >> well, >> but we can only do the delete for JavaThreads not for other >> threads. >> And we'd also have to change the VMThread and WatcherThread >> termination logic because of the deletes that happen in the >> termination thread - the "this" pointer (thread above) may no >> longer >> be valid when we want to call clear_current_thread() - which is >> why we >> can only do the ThreadLocalStorage::set_thread(NULL). >> >> I agree it would be a lot cleaner to have java_start do: >> >> thread->common_initialization(); >> thread->run(); >> thread->common_cleanup(); >> delete thread; >> >> for all threads, but we'd need a lot of other changes to allow for >> that. Otherwise we would need to note that kind of thread before >> calling run() then switch on the thread type after run() to decide >> what kind of cleanup is necessary and possible. I don't think that >> would be better than just doing the "right" cleanup at the end >> of the >> run() methods. > > I understand that this is a bit messy, and I won't insist that we > change > this in this RFR, but without looking at this in much detail it > sounds > weird to delete the thread in run(). Couldn't this be solved by > introducing a virtual Thread::post_run() function and do: > > virtual void Thread::post_run() { > clear_thread_current(); > } > > virtual void JavaThread::post_run() { > Thread::post_run(); > delete this; > }

But again this can't work for the VMThread or WatcherThread as they are deleted from the termination thread and so thread->post_run() may SEGV.** Plus it is only after the fact that you realize not to put "delete this" in Thread::post_run().

OK, I didn't understand what you meant with "termination thread", but I now see the call to VMThread::destroy().

With that said, I find it odd that VMThread::destroy() deletes the VM thread. We already handshake between the VMThread and the "termination thread", so why isn't that VMThread::post_run() implemented as:

virtual void VMThread::post_run() { // signal other threads that VM process is gone { // Note: we must have the _no_safepoint_check_flag. Mutex::lock() allows // VM thread to enter any lock at Safepoint as long as its _owner is NULL. // If that happens after _terminate_lock->wait() has unset _owner // but before it actually drops the lock and waits, the notification below // may get lost and we will have a hang. To avoid this, we need to use // Mutex::lock_without_safepoint_check(). MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag); _terminated = true; _terminate_lock->notify(); }

Thread::post_run(); delete this; }

And then we wouldn't get a SEGV ...

I couldn't find the destructor for the WatchThread, but it seems easy to fix that as well.

I'm probably missing something, but I find it a bit annoying that code that should belong to the *Thread:ing system leaks into the implementations of *Thread::run().

Thanks, StefanK

...
** Arguably the best solution to the "thread termination races with VM termination" problem is to not let the threads terminate. The code as it exists today can still have JavaThreads destroying themselves at the same that the VM is terminating and potentially hit the same errors that require us to not allow the VMThread (and now WatcherThread) to delete themselves.

Thanks, David

> Thanks, > StefanK > >> >> Thanks, >> David >> ------ >> >>> >>> log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", >>> pthread >>> id: " UINTX_FORMAT ").", >>> os::current_thread_id(), (uintx) pthread_self()); >>> >>> return 0; >>> } >>> >>> And get rid of the explicit calls to >>> ThreadLocalStorage::set_thread(NULL) you added? >>> >>> Thanks, >>> StefanK >>> >>> On 04/05/16 01:39, David Holmes wrote: >>>> This needs attention from GC and runtime folk please. >>>> >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154715 >>>> webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/ >>>> >>>> tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always >>>> called >>>> before a thread terminates. >>>> >>>> Background: >>>> >>>> Most system-related threads do not expect to explicitly >>>> terminate, >>>> except sometimes as part of VM termination. Such threads don't >>>> have >>>> their destructors called, but should. >>>> >>>> This omission came to light due to the ThreadLocalStorage >>>> changes in >>>> JDK-8132510. As part of that change we deleted the following >>>> from the >>>> termination path of the VMThread: >>>> >>>> // Thread destructor usually does this. >>>> ThreadLocalStorage::set_thread(NULL); >>>> >>>> The clearing of TLS seemed irrelevant to the VMThread as it >>>> primarily >>>> is used to aid in JNI attach/detach. However Brian Gardner >>>> reported: >>>> >>>> http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm... >>>> >>>> >>>> >>>> >>>> >>>> a problem on FreeBSD caused by this change and the interaction >>>> with >>>> the POSIX pthread TLS destructor use introduced by JDK-8033696. >>>> Because the VMThread terminated without clearing TLS, when the >>>> TLS-destructor was called it got into a loop which ran four >>>> times (as >>>> happens on Linux) and then prints a warning to the console (which >>>> doesn't happen on Linux). >>>> >>>> This indicates we need to restore the: >>>> >>>> ThreadLocalStorage::set_thread(NULL); >>>> >>>> but on further consideration it seems to me that this is not >>>> confined >>>> to the VMThread, and the most appropriate fix would be to always >>>> invoke the Thread destructor as a thread terminates. >>>> >>>> Solution: >>>> >>>> Further investigation shows that calling the Thread destructor >>>> in the >>>> thread as it terminates is not possible: >>>> >>>> - VMThread >>>> >>>> This is actually destroyed by the thread that terminates the VM, >>>> but >>>> that can happen after it terminates and so we still hit the TLS >>>> problem. The VMThread may be able to destroy itself today but in >>>> the >>>> past this was not possible (see existing code comment), and in >>>> the >>>> future it may also not be possible - the problem is that the >>>> Thread >>>> destructor can interact with other VM subsystems that are >>>> concurrently >>>> being torn down by the thread that is terminating the VM. In the >>>> past >>>> this was the CodeHeap. So rather than introduce something that is >>>> fragile we stick with the current scheme but restore the >>>> ThreadLocalStorage::set_thread(NULL); - note we can't access >>>> "this" at >>>> that time because it may already have been de-allocated. >>>> >>>> - WatcherThread >>>> >>>> The WatcherThread is never destroyed today but has the same >>>> problem as >>>> the VMThread. We can call the destructor from the VM termination >>>> thread (and have implemented that), but not from the >>>> WatcherThread >>>> itself. So again we just have to restore the >>>> ThreadLocalStorage::set_thread(NULL); to fix the potential TLS >>>> problem. >>>> >>>> - GC Threads >>>> >>>> There are two cases: >>>> >>>> a) GC threads that never terminate >>>> >>>> For these we don't need to do anything: we can't delete the >>>> thread as >>>> it never terminates and we don't hit the TLS problem because it >>>> never >>>> terminates. So all we will do here is add some logic to check (in >>>> NON_PRODUCT) that we do in fact never terminate. >>>> >>>> b) GC threads that can terminate >>>> >>>> Despite the fact the threads can terminate, references to those >>>> threads are stored elsewhere (WorkGangs and other places) and >>>> are not >>>> cleared as part of the termination process. Those references >>>> can be >>>> touched after the thread has terminated so we can not call the >>>> destructor at all. So again all we can do (without some major >>>> thread >>>> management reworking) is ensure that >>>> ThreadLocalStorage::set_thread(NULL); is called before the thread >>>> actually terminates >>>> >>>> Testing: JPRT >>>> RBT - runtime nightly tests >>>> >>>> Thanks, >>>> David

David Holmes

10 May 10 May

12:33 a.m.

Okay here is version 2: http://cr.openjdk.java.net/~dholmes/8154715/webrev.v2/ Lots of cosmetic changes but only a couple of functional ones: - After thread->run() returns we clear the TLS by calling clear_thread_current(), but only for threads where it has not already been cleared - as those threads may already have been deleted so we can't dereference 'thread' - No asynchronous thread deletion is permitted, and we avoid races with VM termination. This means the VMThread no longer gets deleted - that should not be an issue as many threads do not get deleted when the VM terminates. I added destructors for the VMThread and WatcherThread so anyone introducing their deletion is informed by a guarantee(false) Cosmetic changes: - renamed java_start to thread_native_entry (it is used by all threads not just "java" ones, so this avoids potential confusion) - updated os::free_thread to always assume it works on the current thread (and add assert to verify that) Thanks, David

Stefan Karlsson

8:24 a.m.

Hi David, On 10/05/16 02:33, David Holmes wrote:

...

Okay here is version 2:

http://cr.openjdk.java.net/~dholmes/8154715/webrev.v2/

Lots of cosmetic changes but only a couple of functional ones:

- After thread->run() returns we clear the TLS by calling clear_thread_current(), but only for threads where it has not already been cleared - as those threads may already have been deleted so we can't dereference 'thread'

- No asynchronous thread deletion is permitted, and we avoid races with VM termination. This means the VMThread no longer gets deleted - that should not be an issue as many threads do not get deleted when the VM terminates. I added destructors for the VMThread and WatcherThread so anyone introducing their deletion is informed by a guarantee(false)

This makes the model easier to understand, IMHO. Either you delete the thread from the run() method, or you don't delete it at all. I's there a way to determine how much memory we leak by not deleting the memory owned by the VMThread instance? I'm a bit worried that the VMThread might use more resources than the other threads we don't delete. Thanks, StefanK

...

Cosmetic changes:

- renamed java_start to thread_native_entry (it is used by all threads not just "java" ones, so this avoids potential confusion)

- updated os::free_thread to always assume it works on the current thread (and add assert to verify that)

Thanks, David

David Holmes

10:17 a.m.

Hi Stefan, Thanks for looking at this again. On 10/05/2016 6:24 PM, Stefan Karlsson wrote:

...

Hi David,

On 10/05/16 02:33, David Holmes wrote:

...
Okay here is version 2:

http://cr.openjdk.java.net/~dholmes/8154715/webrev.v2/

Lots of cosmetic changes but only a couple of functional ones:

- After thread->run() returns we clear the TLS by calling clear_thread_current(), but only for threads where it has not already been cleared - as those threads may already have been deleted so we can't dereference 'thread'

- No asynchronous thread deletion is permitted, and we avoid races with VM termination. This means the VMThread no longer gets deleted - that should not be an issue as many threads do not get deleted when the VM terminates. I added destructors for the VMThread and WatcherThread so anyone introducing their deletion is informed by a guarantee(false)

This makes the model easier to understand, IMHO. Either you delete the thread from the run() method, or you don't delete it at all.

I's there a way to determine how much memory we leak by not deleting the memory owned by the VMThread instance? I'm a bit worried that the VMThread might use more resources than the other threads we don't delete.

There is nothing at all special about the VMThread instance, it is just a NamedThread with nothing being referenced by instance fields beyond the name. Any runtime resources are released as the thread terminates - and nothing special there either AFAICS. Thanks, David

...

Thanks, StefanK

...
Cosmetic changes:

- renamed java_start to thread_native_entry (it is used by all threads not just "java" ones, so this avoids potential confusion)

- updated os::free_thread to always assume it works on the current thread (and add assert to verify that)

Thanks, David

Stefan Karlsson

6:59 p.m.

Hi David, On 10/05/16 12:17, David Holmes wrote:

...

Hi Stefan,

Thanks for looking at this again.

On 10/05/2016 6:24 PM, Stefan Karlsson wrote:

...
Hi David,

On 10/05/16 02:33, David Holmes wrote:

...
Okay here is version 2:

http://cr.openjdk.java.net/~dholmes/8154715/webrev.v2/

Lots of cosmetic changes but only a couple of functional ones:

- After thread->run() returns we clear the TLS by calling clear_thread_current(), but only for threads where it has not already been cleared - as those threads may already have been deleted so we can't dereference 'thread'

- No asynchronous thread deletion is permitted, and we avoid races with VM termination. This means the VMThread no longer gets deleted - that should not be an issue as many threads do not get deleted when the VM terminates. I added destructors for the VMThread and WatcherThread so anyone introducing their deletion is informed by a guarantee(false)

This makes the model easier to understand, IMHO. Either you delete the thread from the run() method, or you don't delete it at all.

I's there a way to determine how much memory we leak by not deleting the memory owned by the VMThread instance? I'm a bit worried that the VMThread might use more resources than the other threads we don't delete.

There is nothing at all special about the VMThread instance, it is just a NamedThread with nothing being referenced by instance fields beyond the name. Any runtime resources are released as the thread terminates - and nothing special there either AFAICS.

I was more thinking about the contents of the following memory: Thread::~Thread() { ObjectSynchronizer::omFlush(this); [...] delete resource_area(); [...] delete handle_area(); delete metadata_handles(); The amount of memory that we don't hand back is about 1500 bytes for the VMThread. Most comes from the resource_area(), which is initialized to (init_size = 1*K - slack) in the Thread constructor. The amount of memory doesn't seem to be out-of-the ordinary, so leaking the VMThread doesn't seem to be worse than leaking the other threads, as you said. The patch looks good to me, but you probably want to get a Review by someone well-versed in the HotSpot threading system. Thanks, StefanK

...

Thanks, David

...
Thanks, StefanK

...
Cosmetic changes:

- renamed java_start to thread_native_entry (it is used by all threads not just "java" ones, so this avoids potential confusion)

- updated os::free_thread to always assume it works on the current thread (and add assert to verify that)

Thanks, David

Brian Gardner

8:28 p.m.

I really like the rename of java_start to native_thread_entry, it makes things make more sense. It looks like os_bsd.cpp and os_linux.cpp are both missing the clear_thread_current logic after thread->run(). + // If a thread has not deleted itself ("delete this") as part of its + // termination sequence, we have to ensure thread-local-storage is + // cleared before we actually terminate. No threads should ever be + // deleted asynchronously with respect to their termination. + if (Thread::current_or_null_safe() != NULL) { + assert(Thread::current_or_null_safe() == thread, "current thread is wrong"); + thread->clear_thread_current(); + } +

...

On May 9, 2016, at 5:33 PM, David Holmes <david.holmes@oracle.com> wrote:

Okay here is version 2:

http://cr.openjdk.java.net/~dholmes/8154715/webrev.v2/

Lots of cosmetic changes but only a couple of functional ones:

- After thread->run() returns we clear the TLS by calling clear_thread_current(), but only for threads where it has not already been cleared - as those threads may already have been deleted so we can't dereference 'thread'

- No asynchronous thread deletion is permitted, and we avoid races with VM termination. This means the VMThread no longer gets deleted - that should not be an issue as many threads do not get deleted when the VM terminates. I added destructors for the VMThread and WatcherThread so anyone introducing their deletion is informed by a guarantee(false)

Cosmetic changes:

- renamed java_start to thread_native_entry (it is used by all threads not just "java" ones, so this avoids potential confusion)

- updated os::free_thread to always assume it works on the current thread (and add assert to verify that)

Thanks, David

David Holmes

11 May 11 May

2:17 a.m.

On 11/05/2016 6:28 AM, Brian Gardner wrote:

...

I really like the rename of java_start to native_thread_entry, it makes things make more sense. It looks like os_bsd.cpp and os_linux.cpp are both missing the clear_thread_current logic after thread->run().

Doh! Thanks. That's what I get for trying to multi-task :) Webrev updated in place. Can I get another review from someone in hotspot please. Dan is unfortunately (for me) away. Thanks, David

...

+ // If a thread has not deleted itself ("delete this") as part of its + // termination sequence, we have to ensure thread-local-storage is + // cleared before we actually terminate. No threads should ever be + // deleted asynchronously with respect to their termination. + if (Thread::current_or_null_safe() != NULL) { + assert(Thread::current_or_null_safe() == thread, "current thread is wrong"); + thread->clear_thread_current(); + } +

...
On May 9, 2016, at 5:33 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:

Okay here is version 2:

http://cr.openjdk.java.net/~dholmes/8154715/webrev.v2/

Lots of cosmetic changes but only a couple of functional ones:

- After thread->run() returns we clear the TLS by calling clear_thread_current(), but only for threads where it has not already been cleared - as those threads may already have been deleted so we can't dereference 'thread'

- No asynchronous thread deletion is permitted, and we avoid races with VM termination. This means the VMThread no longer gets deleted - that should not be an issue as many threads do not get deleted when the VM terminates. I added destructors for the VMThread and WatcherThread so anyone introducing their deletion is informed by a guarantee(false)

Cosmetic changes:

- renamed java_start to thread_native_entry (it is used by all threads not just "java" ones, so this avoids potential confusion)

- updated os::free_thread to always assume it works on the current thread (and add assert to verify that)

Thanks, David

serguei.spitsyn＠oracle.com

3:42 a.m.

David, It looks good and safe. Is there anything to do for the closed code as well? Thanks, Serguei On 5/10/16 19:17, David Holmes wrote:

...

On 11/05/2016 6:28 AM, Brian Gardner wrote:

...
I really like the rename of java_start to native_thread_entry, it makes things make more sense. It looks like os_bsd.cpp and os_linux.cpp are both missing the clear_thread_current logic after thread->run().

Doh! Thanks. That's what I get for trying to multi-task :)

Webrev updated in place.

Can I get another review from someone in hotspot please. Dan is unfortunately (for me) away.

Thanks, David

...
+ // If a thread has not deleted itself ("delete this") as part of its + // termination sequence, we have to ensure thread-local-storage is + // cleared before we actually terminate. No threads should ever be + // deleted asynchronously with respect to their termination. + if (Thread::current_or_null_safe() != NULL) { + assert(Thread::current_or_null_safe() == thread, "current thread is wrong"); + thread->clear_thread_current(); + } +

...
On May 9, 2016, at 5:33 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:

Okay here is version 2:

http://cr.openjdk.java.net/~dholmes/8154715/webrev.v2/

Lots of cosmetic changes but only a couple of functional ones:

- After thread->run() returns we clear the TLS by calling clear_thread_current(), but only for threads where it has not already been cleared - as those threads may already have been deleted so we can't dereference 'thread'

- No asynchronous thread deletion is permitted, and we avoid races with VM termination. This means the VMThread no longer gets deleted - that should not be an issue as many threads do not get deleted when the VM terminates. I added destructors for the VMThread and WatcherThread so anyone introducing their deletion is informed by a guarantee(false)

Cosmetic changes:

- renamed java_start to thread_native_entry (it is used by all threads not just "java" ones, so this avoids potential confusion)

- updated os::free_thread to always assume it works on the current thread (and add assert to verify that)

Thanks, David

David Holmes

4:47 a.m.

On 11/05/2016 1:42 PM, serguei.spitsyn@oracle.com wrote:

...

David,

It looks good and safe.

Thanks Serguei!

...

Is there anything to do for the closed code as well?

No. David

...

Thanks, Serguei

On 5/10/16 19:17, David Holmes wrote:

...
On 11/05/2016 6:28 AM, Brian Gardner wrote:

...
I really like the rename of java_start to native_thread_entry, it makes things make more sense. It looks like os_bsd.cpp and os_linux.cpp are both missing the clear_thread_current logic after thread->run().

Doh! Thanks. That's what I get for trying to multi-task :)

Webrev updated in place.

Can I get another review from someone in hotspot please. Dan is unfortunately (for me) away.

Thanks, David

...
+ // If a thread has not deleted itself ("delete this") as part of its + // termination sequence, we have to ensure thread-local-storage is + // cleared before we actually terminate. No threads should ever be + // deleted asynchronously with respect to their termination. + if (Thread::current_or_null_safe() != NULL) { + assert(Thread::current_or_null_safe() == thread, "current thread is wrong"); + thread->clear_thread_current(); + } +

...
On May 9, 2016, at 5:33 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:

Okay here is version 2:

http://cr.openjdk.java.net/~dholmes/8154715/webrev.v2/

Lots of cosmetic changes but only a couple of functional ones:

- After thread->run() returns we clear the TLS by calling clear_thread_current(), but only for threads where it has not already been cleared - as those threads may already have been deleted so we can't dereference 'thread'

- No asynchronous thread deletion is permitted, and we avoid races with VM termination. This means the VMThread no longer gets deleted - that should not be an issue as many threads do not get deleted when the VM terminates. I added destructors for the VMThread and WatcherThread so anyone introducing their deletion is informed by a guarantee(false)

Cosmetic changes:

- renamed java_start to thread_native_entry (it is used by all threads not just "java" ones, so this avoids potential confusion)

- updated os::free_thread to always assume it works on the current thread (and add assert to verify that)

Thanks, David

serguei.spitsyn＠oracle.com

5:30 a.m.

On 5/10/16 21:47, David Holmes wrote:

...

On 11/05/2016 1:42 PM, serguei.spitsyn@oracle.com wrote:

...
David,

It looks good and safe.

Thanks Serguei!

...
Is there anything to do for the closed code as well?

No.

Ok. Thanks, Serguei

...

David

...
Thanks, Serguei

On 5/10/16 19:17, David Holmes wrote:

...
On 11/05/2016 6:28 AM, Brian Gardner wrote:

...
I really like the rename of java_start to native_thread_entry, it makes things make more sense. It looks like os_bsd.cpp and os_linux.cpp are both missing the clear_thread_current logic after thread->run().

Doh! Thanks. That's what I get for trying to multi-task :)

Webrev updated in place.

Can I get another review from someone in hotspot please. Dan is unfortunately (for me) away.

Thanks, David

...
+ // If a thread has not deleted itself ("delete this") as part of its + // termination sequence, we have to ensure thread-local-storage is + // cleared before we actually terminate. No threads should ever be + // deleted asynchronously with respect to their termination. + if (Thread::current_or_null_safe() != NULL) { + assert(Thread::current_or_null_safe() == thread, "current thread is wrong"); + thread->clear_thread_current(); + } +

...
On May 9, 2016, at 5:33 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:

Okay here is version 2:

http://cr.openjdk.java.net/~dholmes/8154715/webrev.v2/

Lots of cosmetic changes but only a couple of functional ones:

- After thread->run() returns we clear the TLS by calling clear_thread_current(), but only for threads where it has not already been cleared - as those threads may already have been deleted so we can't dereference 'thread'

- No asynchronous thread deletion is permitted, and we avoid races with VM termination. This means the VMThread no longer gets deleted - that should not be an issue as many threads do not get deleted when the VM terminates. I added destructors for the VMThread and WatcherThread so anyone introducing their deletion is informed by a guarantee(false)

Cosmetic changes:

- renamed java_start to thread_native_entry (it is used by all threads not just "java" ones, so this avoids potential confusion)

- updated os::free_thread to always assume it works on the current thread (and add assert to verify that)

Thanks, David

Brian Gardner

4:11 a.m.

Looks good to me, and fixes my issues. Thanks David! Brian

...

On May 10, 2016, at 7:17 PM, David Holmes <david.holmes@oracle.com> wrote:

On 11/05/2016 6:28 AM, Brian Gardner wrote:

...
I really like the rename of java_start to native_thread_entry, it makes things make more sense. It looks like os_bsd.cpp and os_linux.cpp are both missing the clear_thread_current logic after thread->run().

Doh! Thanks. That's what I get for trying to multi-task :)

Webrev updated in place.

Can I get another review from someone in hotspot please. Dan is unfortunately (for me) away.

Thanks, David

...
+ // If a thread has not deleted itself ("delete this") as part of its + // termination sequence, we have to ensure thread-local-storage is + // cleared before we actually terminate. No threads should ever be + // deleted asynchronously with respect to their termination. + if (Thread::current_or_null_safe() != NULL) { + assert(Thread::current_or_null_safe() == thread, "current thread is wrong"); + thread->clear_thread_current(); + } +

...
On May 9, 2016, at 5:33 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:

Okay here is version 2:

http://cr.openjdk.java.net/~dholmes/8154715/webrev.v2/

Lots of cosmetic changes but only a couple of functional ones:

- After thread->run() returns we clear the TLS by calling clear_thread_current(), but only for threads where it has not already been cleared - as those threads may already have been deleted so we can't dereference 'thread'

- No asynchronous thread deletion is permitted, and we avoid races with VM termination. This means the VMThread no longer gets deleted - that should not be an issue as many threads do not get deleted when the VM terminates. I added destructors for the VMThread and WatcherThread so anyone introducing their deletion is informed by a guarantee(false)

Cosmetic changes:

- renamed java_start to thread_native_entry (it is used by all threads not just "java" ones, so this avoids potential confusion)

- updated os::free_thread to always assume it works on the current thread (and add assert to verify that)

Thanks, David

David Holmes

4:48 a.m.

On 11/05/2016 2:11 PM, Brian Gardner wrote:

...

Looks good to me, and fixes my issues. Thanks David!

Thanks Brian! I will list you as a co-contributor. David

...

Brian

...
On May 10, 2016, at 7:17 PM, David Holmes <david.holmes@oracle.com> wrote:

On 11/05/2016 6:28 AM, Brian Gardner wrote:

...
I really like the rename of java_start to native_thread_entry, it makes things make more sense. It looks like os_bsd.cpp and os_linux.cpp are both missing the clear_thread_current logic after thread->run().

Doh! Thanks. That's what I get for trying to multi-task :)

Webrev updated in place.

Can I get another review from someone in hotspot please. Dan is unfortunately (for me) away.

Thanks, David

...
+ // If a thread has not deleted itself ("delete this") as part of its + // termination sequence, we have to ensure thread-local-storage is + // cleared before we actually terminate. No threads should ever be + // deleted asynchronously with respect to their termination. + if (Thread::current_or_null_safe() != NULL) { + assert(Thread::current_or_null_safe() == thread, "current thread is wrong"); + thread->clear_thread_current(); + } +

...
On May 9, 2016, at 5:33 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:

Okay here is version 2:

http://cr.openjdk.java.net/~dholmes/8154715/webrev.v2/

Lots of cosmetic changes but only a couple of functional ones:

- After thread->run() returns we clear the TLS by calling clear_thread_current(), but only for threads where it has not already been cleared - as those threads may already have been deleted so we can't dereference 'thread'

- No asynchronous thread deletion is permitted, and we avoid races with VM termination. This means the VMThread no longer gets deleted - that should not be an issue as many threads do not get deleted when the VM terminates. I added destructors for the VMThread and WatcherThread so anyone introducing their deletion is informed by a guarantee(false)

Cosmetic changes:

- renamed java_start to thread_native_entry (it is used by all threads not just "java" ones, so this avoids potential confusion)

- updated os::free_thread to always assume it works on the current thread (and add assert to verify that)

Thanks, David

Daniel D. Daugherty

16 May 16 May

2:15 p.m.

On 5/10/16 8:17 PM, David Holmes wrote:

...

On 11/05/2016 6:28 AM, Brian Gardner wrote:

...
I really like the rename of java_start to native_thread_entry, it makes things make more sense. It looks like os_bsd.cpp and os_linux.cpp are both missing the clear_thread_current logic after thread->run().

Doh! Thanks. That's what I get for trying to multi-task :)

Webrev updated in place.

Can I get another review from someone in hotspot please. Dan is unfortunately (for me) away.

Sorry about that. Had end of year choir and graduation things to take care of... :-)

...

so anyone introducing their deletion is informed by a guarantee(false)

In recent reviews, folks have been asking that we use fatal("my message"); instead of: guarantee(false, "my message"); I see the changeset is already pushed so don't worry about it for this one. Dan

...

Thanks, David

...
+ // If a thread has not deleted itself ("delete this") as part of its + // termination sequence, we have to ensure thread-local-storage is + // cleared before we actually terminate. No threads should ever be + // deleted asynchronously with respect to their termination. + if (Thread::current_or_null_safe() != NULL) { + assert(Thread::current_or_null_safe() == thread, "current thread is wrong"); + thread->clear_thread_current(); + } +

...
On May 9, 2016, at 5:33 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:

Okay here is version 2:

http://cr.openjdk.java.net/~dholmes/8154715/webrev.v2/

Lots of cosmetic changes but only a couple of functional ones:

- After thread->run() returns we clear the TLS by calling clear_thread_current(), but only for threads where it has not already been cleared - as those threads may already have been deleted so we can't dereference 'thread'

- No asynchronous thread deletion is permitted, and we avoid races with VM termination. This means the VMThread no longer gets deleted - that should not be an issue as many threads do not get deleted when the VM terminates. I added destructors for the VMThread and WatcherThread so anyone introducing their deletion is informed by a guarantee(false)

Cosmetic changes:

- renamed java_start to thread_native_entry (it is used by all threads not just "java" ones, so this avoids potential confusion)

- updated os::free_thread to always assume it works on the current thread (and add assert to verify that)

Thanks, David

David Holmes

6 May 6 May

11:38 p.m.

On 7/05/2016 1:41 AM, Stefan Karlsson wrote:

...

On 06/05/16 16:32, David Holmes wrote:

...
On 7/05/2016 12:04 AM, Stefan Karlsson wrote:

...
Hi David,

On 06/05/16 15:38, David Holmes wrote:

...
Hi Stefan,

Thanks for taking a look at this.

On 6/05/2016 5:02 PM, Stefan Karlsson wrote:

...
Hi David,

I looked through the GC part of this webrev and I think the change is fine.

However, it seems a bit error prone. If we decide to change the code to, for example, terminate the AbstractGangWorker threads, then we have to remember to insert a ThreadLocalStorage::set_thread(NULL) call.

That's why I added the ShouldNotReachHere()'s - if those threads start terminating then we will see those hit. Perhaps a comment:

ShouldNotReachHere(); // If thread terminates we have to do TLS cleanup

?

Yes, I would appreciate a comment. Though, when we add new threads, we need to remember to add the set_thread(NULL) call.

Well no, what you would do is manage your new threads in such a way that their run() method can do "delete this" as the last call. Only if you can't do that do you need to think about what termination logic is missing that needs to be done in lieu of the destructor.

Yes, but this forces every implementer of a Thread:run() function to have to think about these kind of requirements.

Absolutely! Everyone who creates a thread should be thinking about its lifecycle. It is, in my opinion, the lack of such thinking that has gotten things in a mess.

...

...
...
...
...
Could we instead add a call to ThreadLocalStorage::set_thread(NULL), or maybe even Thread::clear_thread_current(), in java_start?

static void *java_start(Thread *thread) { [...] thread->initialize_thread_current();

[...]

// call one more level start routine thread->run();

////////// Could we call Thread::clear_thread_current(); here?

Not easily. For JavaThreads we've already done "delete this" inside the run() method, so we'd have to move that into java_start as well, but we can only do the delete for JavaThreads not for other threads. And we'd also have to change the VMThread and WatcherThread termination logic because of the deletes that happen in the termination thread - the "this" pointer (thread above) may no longer be valid when we want to call clear_current_thread() - which is why we can only do the ThreadLocalStorage::set_thread(NULL).

I agree it would be a lot cleaner to have java_start do:

thread->common_initialization(); thread->run(); thread->common_cleanup(); delete thread;

for all threads, but we'd need a lot of other changes to allow for that. Otherwise we would need to note that kind of thread before calling run() then switch on the thread type after run() to decide what kind of cleanup is necessary and possible. I don't think that would be better than just doing the "right" cleanup at the end of the run() methods.

I understand that this is a bit messy, and I won't insist that we change this in this RFR, but without looking at this in much detail it sounds weird to delete the thread in run(). Couldn't this be solved by introducing a virtual Thread::post_run() function and do:

virtual void Thread::post_run() { clear_thread_current(); }

virtual void JavaThread::post_run() { Thread::post_run(); delete this; }

But again this can't work for the VMThread or WatcherThread as they are deleted from the termination thread and so thread->post_run() may SEGV.** Plus it is only after the fact that you realize not to put "delete this" in Thread::post_run().

OK, I didn't understand what you meant with "termination thread", but I now see the call to VMThread::destroy().

With that said, I find it odd that VMThread::destroy() deletes the VM thread. We already handshake between the VMThread and the "termination thread", so why isn't that VMThread::post_run() implemented as:

virtual void VMThread::post_run() { // signal other threads that VM process is gone { // Note: we must have the _no_safepoint_check_flag. Mutex::lock() allows // VM thread to enter any lock at Safepoint as long as its _owner is NULL. // If that happens after _terminate_lock->wait() has unset _owner // but before it actually drops the lock and waits, the notification below // may get lost and we will have a hang. To avoid this, we need to use // Mutex::lock_without_safepoint_check(). MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag); _terminated = true; _terminate_lock->notify(); }

Thread::post_run(); delete this; }

And then we wouldn't get a SEGV ...

Not on the "delete this", no - but you're missing whole aspect of the potential race between the execution of the Thread destructors and the tearing down of the VM that happens in the termination thread. In the past there was an issue with ResourceMark cleanup that used a stub that existed in the CodeCache which was deleted by the terminating thread as part of VM shutdown. That particular issue no longer exists but there could be other issues. We either have to establish and enforce that nothing on the destructor path can encounter an error due to something happening on the termination path, or we maintain the current conservatism and have the terminating thread synchronously destroy the other threads. As I said upfront I chose the latter as the former is somewhat error prone and fragile.

...

I couldn't find the destructor for the WatchThread, but it seems easy to fix that as well.

I'm probably missing something, but I find it a bit annoying that code that should belong to the *Thread:ing system leaks into the implementations of *Thread::run().

Wouldn't be an issue if the "threading system" had complete lifecycle management of all threads - but it doesn't. Eg GC threads are stashed into workgangs and other shared variables that can reference the thread objects after they have terminated. Alas this isn't Java code. :) Cheers, David

...

Thanks, StefanK

...
** Arguably the best solution to the "thread termination races with VM termination" problem is to not let the threads terminate. The code as it exists today can still have JavaThreads destroying themselves at the same that the VM is terminating and potentially hit the same errors that require us to not allow the VMThread (and now WatcherThread) to delete themselves.

Thanks, David

...
Thanks, StefanK

...
Thanks, David ------

...
log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", pthread id: " UINTX_FORMAT ").", os::current_thread_id(), (uintx) pthread_self());

return 0; }

And get rid of the explicit calls to ThreadLocalStorage::set_thread(NULL) you added?

Thanks, StefanK

On 04/05/16 01:39, David Holmes wrote:

...
This needs attention from GC and runtime folk please.

bug: https://bugs.openjdk.java.net/browse/JDK-8154715 webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/

tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called before a thread terminates.

Background:

Most system-related threads do not expect to explicitly terminate, except sometimes as part of VM termination. Such threads don't have their destructors called, but should.

This omission came to light due to the ThreadLocalStorage changes in JDK-8132510. As part of that change we deleted the following from the termination path of the VMThread:

// Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL);

The clearing of TLS seemed irrelevant to the VMThread as it primarily is used to aid in JNI attach/detach. However Brian Gardner reported:

http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm...

a problem on FreeBSD caused by this change and the interaction with the POSIX pthread TLS destructor use introduced by JDK-8033696. Because the VMThread terminated without clearing TLS, when the TLS-destructor was called it got into a loop which ran four times (as happens on Linux) and then prints a warning to the console (which doesn't happen on Linux).

This indicates we need to restore the:

ThreadLocalStorage::set_thread(NULL);

but on further consideration it seems to me that this is not confined to the VMThread, and the most appropriate fix would be to always invoke the Thread destructor as a thread terminates.

Solution:

Further investigation shows that calling the Thread destructor in the thread as it terminates is not possible:

- VMThread

This is actually destroyed by the thread that terminates the VM, but that can happen after it terminates and so we still hit the TLS problem. The VMThread may be able to destroy itself today but in the past this was not possible (see existing code comment), and in the future it may also not be possible - the problem is that the Thread destructor can interact with other VM subsystems that are concurrently being torn down by the thread that is terminating the VM. In the past this was the CodeHeap. So rather than introduce something that is fragile we stick with the current scheme but restore the ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at that time because it may already have been de-allocated.

- WatcherThread

The WatcherThread is never destroyed today but has the same problem as the VMThread. We can call the destructor from the VM termination thread (and have implemented that), but not from the WatcherThread itself. So again we just have to restore the ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem.

- GC Threads

There are two cases:

a) GC threads that never terminate

For these we don't need to do anything: we can't delete the thread as it never terminates and we don't hit the TLS problem because it never terminates. So all we will do here is add some logic to check (in NON_PRODUCT) that we do in fact never terminate.

b) GC threads that can terminate

Despite the fact the threads can terminate, references to those threads are stored elsewhere (WorkGangs and other places) and are not cleared as part of the termination process. Those references can be touched after the thread has terminated so we can not call the destructor at all. So again all we can do (without some major thread management reworking) is ensure that ThreadLocalStorage::set_thread(NULL); is called before the thread actually terminates

Testing: JPRT RBT - runtime nightly tests

Thanks, David

Brian Gardner

7:36 p.m.

After applying your patches I still saw the cleanup messages from pthread. After comparing my original patch to yours and doing some testing it looks like we missed a couple spots. The warning went away after I applied the following patches: http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/... <http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/hotspot/src/share/vm/gc/shared/concurrentGCThread.cpp.cdiff.html> http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/... <http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/hotspot/src/share/vm/gc/cms/concurrentMarkSweepThread.cpp.cdiff.html> Brian

...

On May 3, 2016, at 4:39 PM, David Holmes <david.holmes@oracle.com> wrote:

This needs attention from GC and runtime folk please.

bug: https://bugs.openjdk.java.net/browse/JDK-8154715 webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/

tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called before a thread terminates.

Background:

Most system-related threads do not expect to explicitly terminate, except sometimes as part of VM termination. Such threads don't have their destructors called, but should.

This omission came to light due to the ThreadLocalStorage changes in JDK-8132510. As part of that change we deleted the following from the termination path of the VMThread:

// Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL);

The clearing of TLS seemed irrelevant to the VMThread as it primarily is used to aid in JNI attach/detach. However Brian Gardner reported:

http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm...

a problem on FreeBSD caused by this change and the interaction with the POSIX pthread TLS destructor use introduced by JDK-8033696. Because the VMThread terminated without clearing TLS, when the TLS-destructor was called it got into a loop which ran four times (as happens on Linux) and then prints a warning to the console (which doesn't happen on Linux).

This indicates we need to restore the:

ThreadLocalStorage::set_thread(NULL);

but on further consideration it seems to me that this is not confined to the VMThread, and the most appropriate fix would be to always invoke the Thread destructor as a thread terminates.

Solution:

Further investigation shows that calling the Thread destructor in the thread as it terminates is not possible:

- VMThread

This is actually destroyed by the thread that terminates the VM, but that can happen after it terminates and so we still hit the TLS problem. The VMThread may be able to destroy itself today but in the past this was not possible (see existing code comment), and in the future it may also not be possible - the problem is that the Thread destructor can interact with other VM subsystems that are concurrently being torn down by the thread that is terminating the VM. In the past this was the CodeHeap. So rather than introduce something that is fragile we stick with the current scheme but restore the ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at that time because it may already have been de-allocated.

- WatcherThread

The WatcherThread is never destroyed today but has the same problem as the VMThread. We can call the destructor from the VM termination thread (and have implemented that), but not from the WatcherThread itself. So again we just have to restore the ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem.

- GC Threads

There are two cases:

a) GC threads that never terminate

For these we don't need to do anything: we can't delete the thread as it never terminates and we don't hit the TLS problem because it never terminates. So all we will do here is add some logic to check (in NON_PRODUCT) that we do in fact never terminate.

b) GC threads that can terminate

Despite the fact the threads can terminate, references to those threads are stored elsewhere (WorkGangs and other places) and are not cleared as part of the termination process. Those references can be touched after the thread has terminated so we can not call the destructor at all. So again all we can do (without some major thread management reworking) is ensure that ThreadLocalStorage::set_thread(NULL); is called before the thread actually terminates

Testing: JPRT RBT - runtime nightly tests

Thanks, David

David Holmes

11:42 p.m.

On 7/05/2016 5:36 AM, Brian Gardner wrote:

...

After applying your patches I still saw the cleanup messages from pthread. After comparing my original patch to yours and doing some testing it looks like we missed a couple spots.

The warning went away after I applied the following patches: http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/... http://brian.timestudybuddy.com/webrev/hotspot__clear_thread_current/webrev/...

My patch moves that from inside terminate() to outside of it: 78 void ConcurrentGCThread::run() { 79 initialize_in_thread(); 80 wait_for_universe_init(); 81 82 run_service(); 83 84 terminate(); 85 86 // Can't "delete this" before we terminate as other code 87 // holds references to 'this', but we must do some cleanup 88 // ourselves before allowing the native thread to terminate 89 90 ThreadLocalStorage::set_thread(NULL); 91 } so you should not be seeing any issue because of that. ??? David

...

Brian

...
On May 3, 2016, at 4:39 PM, David Holmes <david.holmes@oracle.com <mailto:david.holmes@oracle.com>> wrote:

This needs attention from GC and runtime folk please.

bug: https://bugs.openjdk.java.net/browse/JDK-8154715 webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/

tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called before a thread terminates.

Background:

Most system-related threads do not expect to explicitly terminate, except sometimes as part of VM termination. Such threads don't have their destructors called, but should.

This omission came to light due to the ThreadLocalStorage changes in JDK-8132510. As part of that change we deleted the following from the termination path of the VMThread:

// Thread destructor usually does this. ThreadLocalStorage::set_thread(NULL);

The clearing of TLS seemed irrelevant to the VMThread as it primarily is used to aid in JNI attach/detach. However Brian Gardner reported:

http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.htm...

a problem on FreeBSD caused by this change and the interaction with the POSIX pthread TLS destructor use introduced by JDK-8033696. Because the VMThread terminated without clearing TLS, when the TLS-destructor was called it got into a loop which ran four times (as happens on Linux) and then prints a warning to the console (which doesn't happen on Linux).

This indicates we need to restore the:

ThreadLocalStorage::set_thread(NULL);

but on further consideration it seems to me that this is not confined to the VMThread, and the most appropriate fix would be to always invoke the Thread destructor as a thread terminates.

Solution:

Further investigation shows that calling the Thread destructor in the thread as it terminates is not possible:

- VMThread

This is actually destroyed by the thread that terminates the VM, but that can happen after it terminates and so we still hit the TLS problem. The VMThread may be able to destroy itself today but in the past this was not possible (see existing code comment), and in the future it may also not be possible - the problem is that the Thread destructor can interact with other VM subsystems that are concurrently being torn down by the thread that is terminating the VM. In the past this was the CodeHeap. So rather than introduce something that is fragile we stick with the current scheme but restore the ThreadLocalStorage::set_thread(NULL); - note we can't access "this" at that time because it may already have been de-allocated.

- WatcherThread

The WatcherThread is never destroyed today but has the same problem as the VMThread. We can call the destructor from the VM termination thread (and have implemented that), but not from the WatcherThread itself. So again we just have to restore the ThreadLocalStorage::set_thread(NULL); to fix the potential TLS problem.

- GC Threads

There are two cases:

a) GC threads that never terminate

For these we don't need to do anything: we can't delete the thread as it never terminates and we don't hit the TLS problem because it never terminates. So all we will do here is add some logic to check (in NON_PRODUCT) that we do in fact never terminate.

b) GC threads that can terminate

Despite the fact the threads can terminate, references to those threads are stored elsewhere (WorkGangs and other places) and are not cleared as part of the termination process. Those references can be touched after the thread has terminated so we can not call the destructor at all. So again all we can do (without some major thread management reworking) is ensure that ThreadLocalStorage::set_thread(NULL); is called before the thread actually terminates

Testing: JPRT RBT - runtime nightly tests

Thanks, David

3579

Age (days ago)

3592

Last active (days ago)

List overview

Download

28 comments

5 participants

participants (5)

Brian Gardner
Daniel D. Daugherty
David Holmes
serguei.spitsyn＠oracle.com
Stefan Karlsson

RFR: 8154715: Missing destructor and/or TLS clearing calls for terminating threads

tags

participants (5)