RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v6]

Tue Sep 29 14:51:07 UTC 2020

On Tue, 29 Sep 2020 14:39:23 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

>>> Hi Erik,
>>> 
>>> I have been playing with this patch for past a few days. Great work!
>>> 
>>> I found that this patch seems to break an early assumption.
>>> We have a comment in JavaThread::exit() says:
>>> 
>>>   // We need to cache the thread name for logging purposes below as once
>>>   // we have called on_thread_detach this thread must not access any oops.
>>> 
>>> Then in method :
>>> 
>>> ```
>>> 
>>> void Threads::remove(JavaThread* p, bool is_daemon)  {
>>>  ...
>>>   BarrierSet::barrier_set()->on_thread_detach(p);
>>> 
>>>   // Extra scope needed for Thread_lock, so we can check
>>>   // that we do not remove thread without safepoint code notice
>>>   { MonitorLocker ml(Threads_lock);
>>> ..
>>> }
>>> ```
>>> 
>>> It calls barrier's on_thread_detach(), acquires Threads_lock.
>>> The lock acquisition triggers stack processing, that potential accesses oops.
>>> 
>>> ```
>>> 
>>> V  [libjvm.so+0x10c6f5e]  StackWatermark::start_processing()+0x6a
>>> V  [libjvm.so+0x10c77e8]  StackWatermarkSet::start_processing(JavaThread*, StackWatermarkKind)+0x82
>>> V  [libjvm.so+0xfd7757]  SafepointMechanism::process(JavaThread*)+0x37
>>> V  [libjvm.so+0xfd796b]  SafepointMechanism::process_if_requested_slow(JavaThread*)+0x1d
>>> V  [libjvm.so+0x4b3683]  SafepointMechanism::process_if_requested(JavaThread*)+0x2b
>>> V  [libjvm.so+0xe87f0d]  ThreadBlockInVMWithDeadlockCheck::~ThreadBlockInVMWithDeadlockCheck()+0x5f
>>> V  [libjvm.so+0xe86700]  Mutex::lock_contended(Thread*)+0x12c
>>> V  [libjvm.so+0xe867d8]  Mutex::lock(Thread*)+0x96
>>> V  [libjvm.so+0xe86823]  Mutex::lock()+0x23
>>> V  [libjvm.so+0x29b4bc]  MutexLocker::MutexLocker(Mutex*, Mutex::SafepointCheckFlag)+0xe2
>>> V  [libjvm.so+0x29b533]  MonitorLocker::MonitorLocker(Monitor*, Mutex::SafepointCheckFlag)+0x29
>>> V  [libjvm.so+0x119f2ce]  Threads::remove(JavaThread*, bool)+0x56
>>> V  [libjvm.so+0x1198a2b]  JavaThread::exit(bool, JavaThread::ExitType)+0x905
>>> V  [libjvm.so+0x1197fde]  JavaThread::post_run()+0x22
>>> V  [libjvm.so+0x1193eae]  Thread::call_run()+0x230
>>> V  [libjvm.so+0xef3e38]  thread_native_entry(Thread*)+0x1e4
>>> ```
>>> 
>>> This is a problem for Shenandoah, as it flushes SATB buffers during on_thread_detach() and does not expect to see any
>>> more SATB traffic.
>>> Thanks.
>> 
>> What oop are you encountering here? You should have no frames left at this point, and all oops should have been
>> cleared. At least that is the theory, and that was why the thread oop moved out from the thread (to enforce that). So I
>> am curious what oop you have found to still be around at this point.  Anyway, you can try moving the GC hook into the
>> critical section. That should help.
>
>> > Hi Erik,
>> > I have been playing with this patch for past a few days. Great work!
>> > I found that this patch seems to break an early assumption.
>> > We have a comment in JavaThread::exit() says:
>> > // We need to cache the thread name for logging purposes below as once
>> > // we have called on_thread_detach this thread must not access any oops.
>> > Then in method :
>> > ```
>> > 
>> > void Threads::remove(JavaThread* p, bool is_daemon)  {
>> >  ...
>> >   BarrierSet::barrier_set()->on_thread_detach(p);
>> > 
>> >   // Extra scope needed for Thread_lock, so we can check
>> >   // that we do not remove thread without safepoint code notice
>> >   { MonitorLocker ml(Threads_lock);
>> > ..
>> > }
>> > ```
>> > 
>> > 
>> > It calls barrier's on_thread_detach(), acquires Threads_lock.
>> > The lock acquisition triggers stack processing, that potential accesses oops.
>> > ```
>> > 
>> > V  [libjvm.so+0x10c6f5e]  StackWatermark::start_processing()+0x6a
>> > V  [libjvm.so+0x10c77e8]  StackWatermarkSet::start_processing(JavaThread*, StackWatermarkKind)+0x82
>> > V  [libjvm.so+0xfd7757]  SafepointMechanism::process(JavaThread*)+0x37
>> > V  [libjvm.so+0xfd796b]  SafepointMechanism::process_if_requested_slow(JavaThread*)+0x1d
>> > V  [libjvm.so+0x4b3683]  SafepointMechanism::process_if_requested(JavaThread*)+0x2b
>> > V  [libjvm.so+0xe87f0d]  ThreadBlockInVMWithDeadlockCheck::~ThreadBlockInVMWithDeadlockCheck()+0x5f
>> > V  [libjvm.so+0xe86700]  Mutex::lock_contended(Thread*)+0x12c
>> > V  [libjvm.so+0xe867d8]  Mutex::lock(Thread*)+0x96
>> > V  [libjvm.so+0xe86823]  Mutex::lock()+0x23
>> > V  [libjvm.so+0x29b4bc]  MutexLocker::MutexLocker(Mutex*, Mutex::SafepointCheckFlag)+0xe2
>> > V  [libjvm.so+0x29b533]  MonitorLocker::MonitorLocker(Monitor*, Mutex::SafepointCheckFlag)+0x29
>> > V  [libjvm.so+0x119f2ce]  Threads::remove(JavaThread*, bool)+0x56
>> > V  [libjvm.so+0x1198a2b]  JavaThread::exit(bool, JavaThread::ExitType)+0x905
>> > V  [libjvm.so+0x1197fde]  JavaThread::post_run()+0x22
>> > V  [libjvm.so+0x1193eae]  Thread::call_run()+0x230
>> > V  [libjvm.so+0xef3e38]  thread_native_entry(Thread*)+0x1e4
>> > ```
>> > 
>> > 
>> > This is a problem for Shenandoah, as it flushes SATB buffers during on_thread_detach() and does not expect to see any
>> > more SATB traffic. Thanks.
>> 
>> What oop are you encountering here? You should have no frames left at this point, and all oops should have been
>> cleared. At least that is the theory, and that was why the thread oop moved out from the thread (to enforce that). So I
>> am curious what oop you have found to still be around at this point.
> 
> They are handles:
> <pre><code>
> V  [libjvm.so+0x98d693]  chunk_oops_do(OopClosure*, Chunk*, char*)+0xce
> V  [libjvm.so+0x98d6d3]  HandleArea::oops_do(OopClosure*)+0x37
> V  [libjvm.so+0x1194ad0]  Thread::oops_do_no_frames(OopClosure*, CodeBlobClosure*)+0x88
> V  [libjvm.so+0x119b0c8]  JavaThread::oops_do_no_frames(OopClosure*, CodeBlobClosure*)+0x88
> </code></pre>
> 
>> 
>> Anyway, you can try moving the GC hook into the critical section. That should help.
> 
> This seems to work.

> > > Hi Erik,
> > > I have been playing with this patch for past a few days. Great work!
> > > I found that this patch seems to break an early assumption.
> > > We have a comment in JavaThread::exit() says:
> > > // We need to cache the thread name for logging purposes below as once
> > > // we have called on_thread_detach this thread must not access any oops.
> > > Then in method :
> > > ```
> > > 
> > > void Threads::remove(JavaThread* p, bool is_daemon)  {
> > >  ...
> > >   BarrierSet::barrier_set()->on_thread_detach(p);
> > > 
> > >   // Extra scope needed for Thread_lock, so we can check
> > >   // that we do not remove thread without safepoint code notice
> > >   { MonitorLocker ml(Threads_lock);
> > > ..
> > > }
> > > ```
> > > 
> > > 
> > > It calls barrier's on_thread_detach(), acquires Threads_lock.
> > > The lock acquisition triggers stack processing, that potential accesses oops.
> > > ```
> > > 
> > > V  [libjvm.so+0x10c6f5e]  StackWatermark::start_processing()+0x6a
> > > V  [libjvm.so+0x10c77e8]  StackWatermarkSet::start_processing(JavaThread*, StackWatermarkKind)+0x82
> > > V  [libjvm.so+0xfd7757]  SafepointMechanism::process(JavaThread*)+0x37
> > > V  [libjvm.so+0xfd796b]  SafepointMechanism::process_if_requested_slow(JavaThread*)+0x1d
> > > V  [libjvm.so+0x4b3683]  SafepointMechanism::process_if_requested(JavaThread*)+0x2b
> > > V  [libjvm.so+0xe87f0d]  ThreadBlockInVMWithDeadlockCheck::~ThreadBlockInVMWithDeadlockCheck()+0x5f
> > > V  [libjvm.so+0xe86700]  Mutex::lock_contended(Thread*)+0x12c
> > > V  [libjvm.so+0xe867d8]  Mutex::lock(Thread*)+0x96
> > > V  [libjvm.so+0xe86823]  Mutex::lock()+0x23
> > > V  [libjvm.so+0x29b4bc]  MutexLocker::MutexLocker(Mutex*, Mutex::SafepointCheckFlag)+0xe2
> > > V  [libjvm.so+0x29b533]  MonitorLocker::MonitorLocker(Monitor*, Mutex::SafepointCheckFlag)+0x29
> > > V  [libjvm.so+0x119f2ce]  Threads::remove(JavaThread*, bool)+0x56
> > > V  [libjvm.so+0x1198a2b]  JavaThread::exit(bool, JavaThread::ExitType)+0x905
> > > V  [libjvm.so+0x1197fde]  JavaThread::post_run()+0x22
> > > V  [libjvm.so+0x1193eae]  Thread::call_run()+0x230
> > > V  [libjvm.so+0xef3e38]  thread_native_entry(Thread*)+0x1e4
> > > ```
> > > 
> > > 
> > > This is a problem for Shenandoah, as it flushes SATB buffers during on_thread_detach() and does not expect to see any
> > > more SATB traffic. Thanks.
> > 
> > 
> > What oop are you encountering here? You should have no frames left at this point, and all oops should have been
> > cleared. At least that is the theory, and that was why the thread oop moved out from the thread (to enforce that). So I
> > am curious what oop you have found to still be around at this point.
> 
> They are handles:
> 
> ```
> 
> V  [libjvm.so+0x98d693]  chunk_oops_do(OopClosure*, Chunk*, char*)+0xce
> V  [libjvm.so+0x98d6d3]  HandleArea::oops_do(OopClosure*)+0x37
> V  [libjvm.so+0x1194ad0]  Thread::oops_do_no_frames(OopClosure*, CodeBlobClosure*)+0x88
> V  [libjvm.so+0x119b0c8]  JavaThread::oops_do_no_frames(OopClosure*, CodeBlobClosure*)+0x88
> ```
> 
> > Anyway, you can try moving the GC hook into the critical section. That should help.
> 
> This seems to work.

Okay, I will move it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/296