RFR: 8241296: Segfault in JNIHandleBlock::oops_do()

Fri Mar 20 08:35:01 UTC 2020

Hi Andrew,

Thanks for clarifying where and why this failed!

StefanK

On 2020-03-19 17:47, Andrew Haley wrote:
> Hi,
>
> On 3/19/20 3:22 PM, Stefan Karlsson wrote:
>
>> I think the fix is fine.
> OK, thanks.
>
>   > Would you mind sharing some extra info? For example the stack trace
>> of the scanned thread, and / or flags used to provoke this? I would
>> like to know why we haven't seen this before.
> Sure.
>
> #0  0x00007ffff7dafb02 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib
> #1  0x00007ffff77533fb in os::PlatformEvent::park (this=0x7ffff0ab690
> #2  0x00007ffff7706805 in ParkCommon (timo=0, ev=0x7ffff0ab6900)
> #3  Monitor::ILock (this=this at entry=0x7ffff0005b30, Self=Self at entry=0
> #4  0x00007ffff7706ffa in Monitor::lock_without_safepoint_check (Self
> #5  Monitor::lock_without_safepoint_check (this=0x7ffff0005b30)
> #6  0x00007ffff77e7f71 in SafepointSynchronize::block (thread=0x7ffff
> #7  0x00007ffff77e6afa in SafepointSynchronize::block (thread=thread@
> #8  0x00007ffff78fd897 in ThreadStateTransition::transition_and_fence
> #9  JavaThread::run (this=0x7ffff0ab5800)
> #10 0x00007ffff7747d78 in java_start (thread=0x7ffff0ab5800)
> #11 0x00007ffff7da9472 in start_thread () from /lib64/libpthread.so.0
> #12 0x00007ffff7ee5063 in clone () from /lib64/libc.so.6
>
> The thread blocked in transition_and_fence() here: note this is in JDK
> 8, but it hasn't changed AFAICS:
>
> // The first routine called by a new Java thread
> void JavaThread::run() {
>    // initialize thread-local alloc buffer related fields
>    this->initialize_tlab();
>
>    // used to test validitity of stack trace backs
>    this->record_base_of_stack_pointer();
>
>    // Record real stack base and size.
>    this->record_stack_base_and_size();
>
>    // Initialize thread local storage; set before calling MutexLocker
>    this->initialize_thread_local_storage();
>
>    this->create_stack_guard_pages();
>
>    this->cache_global_variables();
>
>    // Thread is now sufficient initialized to be handled by the safepoint code as being
>    // in the VM. Change thread state from _thread_new to _thread_in_vm
> =>ThreadStateTransition::transition_and_fence(this, _thread_new, _thread_in_vm);
>
>    assert(JavaThread::current() == this, "sanity check");
>    assert(!Thread::current()->owns_locks(), "sanity check");
>
>    DTRACE_THREAD_PROBE(start, this);
>
>    // This operation might block. We call that after all safepoint checks for a new thread has
>    // been completed.
>    this->set_active_handles(JNIHandleBlock::allocate_block());
>
> So it's pretty obvious why active_handles wasn't set yet. This code
> isn't obviously different from that in jdk/jdk, but I have not been
> able to reproduce the bug there. IMO, though, it's still a bug in
> jdk/jdk.
>
> The most likely reason we haven't seen this before is that
> JNIHandleBlock::oops_do() looks like this:
>
> void JNIHandleBlock::oops_do(OopClosure* f) {
>    JNIHandleBlock* current_chain = this;
>    while (current_chain != NULL) {
>      ...
>    }
>
> A sufficiently adversarial compiler can turn this into
>
> void JNIHandleBlock::oops_do(OopClosure* f) {
>    JNIHandleBlock* current_chain = this;
>    do {
>      ...
>    } while (current_chain != NULL)
>
> because "this" can never be null in a member function. GCC sometimes
> does this transformation.
>