RFR: 8241296: Segfault in JNIHandleBlock::oops_do()
Andrew Haley
aph at redhat.com
Thu Mar 19 16:47:07 UTC 2020
Hi,
On 3/19/20 3:22 PM, Stefan Karlsson wrote:
> I think the fix is fine.
OK, thanks.
> Would you mind sharing some extra info? For example the stack trace
> of the scanned thread, and / or flags used to provoke this? I would
> like to know why we haven't seen this before.
Sure.
#0 0x00007ffff7dafb02 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib
#1 0x00007ffff77533fb in os::PlatformEvent::park (this=0x7ffff0ab690
#2 0x00007ffff7706805 in ParkCommon (timo=0, ev=0x7ffff0ab6900)
#3 Monitor::ILock (this=this at entry=0x7ffff0005b30, Self=Self at entry=0
#4 0x00007ffff7706ffa in Monitor::lock_without_safepoint_check (Self
#5 Monitor::lock_without_safepoint_check (this=0x7ffff0005b30)
#6 0x00007ffff77e7f71 in SafepointSynchronize::block (thread=0x7ffff
#7 0x00007ffff77e6afa in SafepointSynchronize::block (thread=thread@
#8 0x00007ffff78fd897 in ThreadStateTransition::transition_and_fence
#9 JavaThread::run (this=0x7ffff0ab5800)
#10 0x00007ffff7747d78 in java_start (thread=0x7ffff0ab5800)
#11 0x00007ffff7da9472 in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffff7ee5063 in clone () from /lib64/libc.so.6
The thread blocked in transition_and_fence() here: note this is in JDK
8, but it hasn't changed AFAICS:
// The first routine called by a new Java thread
void JavaThread::run() {
// initialize thread-local alloc buffer related fields
this->initialize_tlab();
// used to test validitity of stack trace backs
this->record_base_of_stack_pointer();
// Record real stack base and size.
this->record_stack_base_and_size();
// Initialize thread local storage; set before calling MutexLocker
this->initialize_thread_local_storage();
this->create_stack_guard_pages();
this->cache_global_variables();
// Thread is now sufficient initialized to be handled by the safepoint code as being
// in the VM. Change thread state from _thread_new to _thread_in_vm
=>ThreadStateTransition::transition_and_fence(this, _thread_new, _thread_in_vm);
assert(JavaThread::current() == this, "sanity check");
assert(!Thread::current()->owns_locks(), "sanity check");
DTRACE_THREAD_PROBE(start, this);
// This operation might block. We call that after all safepoint checks for a new thread has
// been completed.
this->set_active_handles(JNIHandleBlock::allocate_block());
So it's pretty obvious why active_handles wasn't set yet. This code
isn't obviously different from that in jdk/jdk, but I have not been
able to reproduce the bug there. IMO, though, it's still a bug in
jdk/jdk.
The most likely reason we haven't seen this before is that
JNIHandleBlock::oops_do() looks like this:
void JNIHandleBlock::oops_do(OopClosure* f) {
JNIHandleBlock* current_chain = this;
while (current_chain != NULL) {
...
}
A sufficiently adversarial compiler can turn this into
void JNIHandleBlock::oops_do(OopClosure* f) {
JNIHandleBlock* current_chain = this;
do {
...
} while (current_chain != NULL)
because "this" can never be null in a member function. GCC sometimes
does this transformation.
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the hotspot-gc-dev
mailing list