RFR: Implement protocol for safe OOM during evacuation handling

Roman Kennke rkennke at redhat.com
Mon Mar 5 15:29:38 UTC 2018


Am 05.03.2018 um 16:08 schrieb Zhengyu Gu:
> Hi Roman,
> 
>> ShenandoahOOMDuringEvacScope
>> - The protocol has been designed to allow repeated calls into
>> evac_object even if OOM-during-evac is active:
>>    - workers may work in strides. OOMing at one object doesn't mean it's
>> not attempted again for the next
>>    - write-barriers may return, and go into another write-barrier before
>> reaching a safepoint
>>
>> ... it is ok to do that, the protocol will lead to simple+safe RB when
>> calling into evac_object() again.
>>
>> - There are situations when we need to *leave* the scope. Most
>> importantly, workers (in partial and traversal) need to signal the
>> terminator that they are ready, which will cause them to wait other
>> workers to finish, and in which they will *not* be able to give up the
>> OOM-counter. We must leave the scope before signalling the terminator,
>> and we have ShenandoahOOMDuringEvacScopeLeaver for that. There are a few
>> other situations where we need to leave the scope to avoid nested
>> scoping. Leaving the scope like this is ok because of the above
>> mentioned design to allow repeated calls into the protocol.
> 
> This sounds suspicious ... you have counter that drops to 0, then comes
> back up, I think there can have race here.
> 
> shenandoahTraversalGC.cpp
> 
>  483     for (uint i = 0; i < stride; i++) {
>  484       if ((q->pop_buffer(task) ||
>  485            q->pop_local(task) ||
>  486            q->pop_overflow(task) ||
>  487            (DO_SATB &&
> satb_mq_set.apply_closure_to_completed_buffer(&satb_cl) &&
> q->pop_buffer(task)) ||
>  488            queues->steal(worker_id, &seed, task))) {
>  489         conc_mark->do_task<T, true>(q, cl, live_data, &task);
>  490       } else {
>  491         ShenandoahOOMDuringEvacScopeLeaver oom_scope_leaver;
>  492         if (terminator->offer_termination()) return;
>  493       }
> 
> 
> E.g. L#491 counter drops to 0 -> WB returns -> fails to terminate -> it
> can evacuate again?
> 

I don't think so. After the protocol is done, it will keep up the
OOM_MARKER_MASK until it's cleared during safepoint. This is checked
upon entry of the protocol, and does a fast-path-return after setting up
the flags to return with RB.

If it happens to enter repeatedly while the protocol is in progress, it
will participate in it as normal, e.g. will be blocked to enter until
the protocol is done, and then observe the OOM_MARKER_MASK and return
with RB as above.

It must be ok to enter into the protocol repeatedly without race. If you
find a race please describe it to me.

Roman




More information about the shenandoah-dev mailing list