RFR: Implement protocol for safe OOM during evacuation handling

Zhengyu Gu zgu at redhat.com
Mon Mar 5 15:41:50 UTC 2018



On 03/05/2018 10:29 AM, Roman Kennke wrote:
> Am 05.03.2018 um 16:08 schrieb Zhengyu Gu:
>> Hi Roman,
>>
>>> ShenandoahOOMDuringEvacScope
>>> - The protocol has been designed to allow repeated calls into
>>> evac_object even if OOM-during-evac is active:
>>>     - workers may work in strides. OOMing at one object doesn't mean it's
>>> not attempted again for the next
>>>     - write-barriers may return, and go into another write-barrier before
>>> reaching a safepoint
>>>
>>> ... it is ok to do that, the protocol will lead to simple+safe RB when
>>> calling into evac_object() again.
>>>
>>> - There are situations when we need to *leave* the scope. Most
>>> importantly, workers (in partial and traversal) need to signal the
>>> terminator that they are ready, which will cause them to wait other
>>> workers to finish, and in which they will *not* be able to give up the
>>> OOM-counter. We must leave the scope before signalling the terminator,
>>> and we have ShenandoahOOMDuringEvacScopeLeaver for that. There are a few
>>> other situations where we need to leave the scope to avoid nested
>>> scoping. Leaving the scope like this is ok because of the above
>>> mentioned design to allow repeated calls into the protocol.
>>
>> This sounds suspicious ... you have counter that drops to 0, then comes
>> back up, I think there can have race here.
>>
>> shenandoahTraversalGC.cpp
>>
>>   483     for (uint i = 0; i < stride; i++) {
>>   484       if ((q->pop_buffer(task) ||
>>   485            q->pop_local(task) ||
>>   486            q->pop_overflow(task) ||
>>   487            (DO_SATB &&
>> satb_mq_set.apply_closure_to_completed_buffer(&satb_cl) &&
>> q->pop_buffer(task)) ||
>>   488            queues->steal(worker_id, &seed, task))) {
>>   489         conc_mark->do_task<T, true>(q, cl, live_data, &task);
>>   490       } else {
>>   491         ShenandoahOOMDuringEvacScopeLeaver oom_scope_leaver;
>>   492         if (terminator->offer_termination()) return;
>>   493       }
>>
>>
>> E.g. L#491 counter drops to 0 -> WB returns -> fails to terminate -> it
>> can evacuate again?
>>
> 
> I don't think so. After the protocol is done, it will keep up the
> OOM_MARKER_MASK until it's cleared during safepoint. This is checked
> upon entry of the protocol, and does a fast-path-return after setting up
> the flags to return with RB.
> 
> If it happens to enter repeatedly while the protocol is in progress, it
> will participate in it as normal, e.g. will be blocked to enter until
> the protocol is done, and then observe the OOM_MARKER_MASK and return
> with RB as above.
> 
> It must be ok to enter into the protocol repeatedly without race. If you
> find a race please describe it to me.
> 
What I described above, is an example, no?

L#491 drops counter to 0, so WB exit.
L#492 fails to terminate, re-enters protocol. However, there is no check 
if OOM protocol is in progress, then goes to L#484, and find some thing ...

Do I miss something here?

Thanks,

-Zhengyu


> Roman
> 
> 


More information about the shenandoah-dev mailing list