RFR (S): Fix races on full GC request

Mon Dec 12 15:20:12 UTC 2016

Hi,

>  173 void ShenandoahConcurrentThread::do_full_gc(GCCause::Cause
> cause) {
>  175   assert(Thread::current()->is_Java_thread(), "expect Java
> thread here");
>  176
>  177   MonitorLockerEx ml(&_full_gc_lock);
>  178   schedule_full_gc(); // sets _do_full_gc = true
>  179   _full_gc_cause = cause;
>  180
>  181   // Now that full GC is scheduled, we can abort everything else
>  182   ShenandoahHeap::heap()->cancel_concgc(cause);
>  183
>  184   while (_do_full_gc) {
>  185     ml.wait();
>  186     OrderAccess::storeload();
>  187   }
>  188   assert(!_do_full_gc, "expect full GC to have completed");
>  189 }
> 
> If there is a thread that blocked on _full_gc_lock when Full GC had
> started, but
> re-entered after Full GC is completed, it would try to schedule full
> GC / cancel
> conc GC again! This mostly happens when full GCs are really short.
> 
> In our current code, this also fails the assert in Shenandoah control
> thread
> that every cancellation should have a reason, like impending full GC.
> This
> interesting result is because there are racy unlocked gets of
> _do_full_gc in
> assertion code.
> 
> Both are solved by turning _do_full_gc updates atomic/lock-free, and
> using the
> lock only for wait/notifies:
>   http://cr.openjdk.java.net/~shade/shenandoah/cancel-races-again/web
> rev.02/

Looks good to me.

> P.S. I swear to God, another race there, and I will burn the entire
> termination
> protocol thing down.

:-D

I can't count how many races and strange conditions we already fixed
there. One entire problem class went *puff* away when Zhengyu suggested
to simplify JNI critical regions. I seriously hope it's the last one.
Otherwise we simply give up on terminating? :-P

Roman