RFR: 8241296: Segfault in JNIHandleBlock::oops_do()

Thu Mar 19 16:47:07 UTC 2020

Hi,

On 3/19/20 3:22 PM, Stefan Karlsson wrote:

> I think the fix is fine.

OK, thanks.

 > Would you mind sharing some extra info? For example the stack trace
> of the scanned thread, and / or flags used to provoke this? I would
> like to know why we haven't seen this before.

Sure.

#0  0x00007ffff7dafb02 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib
#1  0x00007ffff77533fb in os::PlatformEvent::park (this=0x7ffff0ab690
#2  0x00007ffff7706805 in ParkCommon (timo=0, ev=0x7ffff0ab6900)
#3  Monitor::ILock (this=this at entry=0x7ffff0005b30, Self=Self at entry=0
#4  0x00007ffff7706ffa in Monitor::lock_without_safepoint_check (Self
#5  Monitor::lock_without_safepoint_check (this=0x7ffff0005b30)
#6  0x00007ffff77e7f71 in SafepointSynchronize::block (thread=0x7ffff
#7  0x00007ffff77e6afa in SafepointSynchronize::block (thread=thread@
#8  0x00007ffff78fd897 in ThreadStateTransition::transition_and_fence
#9  JavaThread::run (this=0x7ffff0ab5800)
#10 0x00007ffff7747d78 in java_start (thread=0x7ffff0ab5800)
#11 0x00007ffff7da9472 in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffff7ee5063 in clone () from /lib64/libc.so.6

The thread blocked in transition_and_fence() here: note this is in JDK
8, but it hasn't changed AFAICS:

// The first routine called by a new Java thread
void JavaThread::run() {
  // initialize thread-local alloc buffer related fields
  this->initialize_tlab();

  // used to test validitity of stack trace backs
  this->record_base_of_stack_pointer();

  // Record real stack base and size.
  this->record_stack_base_and_size();

  // Initialize thread local storage; set before calling MutexLocker
  this->initialize_thread_local_storage();

  this->create_stack_guard_pages();

  this->cache_global_variables();

  // Thread is now sufficient initialized to be handled by the safepoint code as being
  // in the VM. Change thread state from _thread_new to _thread_in_vm
=>ThreadStateTransition::transition_and_fence(this, _thread_new, _thread_in_vm);

  assert(JavaThread::current() == this, "sanity check");
  assert(!Thread::current()->owns_locks(), "sanity check");

  DTRACE_THREAD_PROBE(start, this);

  // This operation might block. We call that after all safepoint checks for a new thread has
  // been completed.
  this->set_active_handles(JNIHandleBlock::allocate_block());

So it's pretty obvious why active_handles wasn't set yet. This code
isn't obviously different from that in jdk/jdk, but I have not been
able to reproduce the bug there. IMO, though, it's still a bug in
jdk/jdk.

The most likely reason we haven't seen this before is that
JNIHandleBlock::oops_do() looks like this:

void JNIHandleBlock::oops_do(OopClosure* f) {
  JNIHandleBlock* current_chain = this;
  while (current_chain != NULL) {
    ...
  }

A sufficiently adversarial compiler can turn this into

void JNIHandleBlock::oops_do(OopClosure* f) {
  JNIHandleBlock* current_chain = this;
  do {
    ...
  } while (current_chain != NULL)

because "this" can never be null in a member function. GCC sometimes
does this transformation.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671