Fixing the OOM-during-evac
Roman Kennke
rkennke at redhat.com
Wed Feb 28 13:53:49 UTC 2018
Here's my current prototype which seems to pass initial tests with
-XX:+ShenandoahOOMDuringEvacALot
http://cr.openjdk.java.net/~rkennke/safe-oom-during-evac-counter.patch
It's slightly dirty. It's likely to be slow because it currently
enters/leaves the protected section for each object, even for GC
threads, which should not happen.
Roman
On Wed, Feb 28, 2018 at 12:42 PM, Roman Kennke <rkennke at redhat.com> wrote:
> While implementing the prototype, I came upon an issue with the
> protocol: if we get the OOM marker into the counter, we loose the
> actual counter.
>
> The solution is to not CAS a full special value, but mask the current
> counter with an extra bit and handle/mask that accordingly.
>
> Roman
>
> On Wed, Feb 28, 2018 at 11:42 AM, Roman Kennke <rkennke at redhat.com> wrote:
>> This issue keeps haunting me. :-)
>> Over coffee, I had an idea how to solve it. Let me outline it and open
>> for discussion.
>>
>> The issue is that when a Java thread hits OOM while in the
>> write-barrier, another thread (Java or GC) may still succeed to
>> evacuate the object. This is racy, because thread#1 may get a
>> from-space copy and write to this, while other threads may get a
>> to-space copy and write to that.
>>
>> We need to prevent any other thread from evacuating our failed-to-evac
>> object, or else safely get the other copy.
>>
>> My idea is to have a counter for number of threads in the evacuation
>> path, and as soon as we hit OOM there, wait until the counter drops to
>> zero, at which point we can be sure to not get the object evacuated
>> under our feet.
>>
>> We need to protect the evacuation path with the following protocol.
>> 'The evacuation path' is the code around actual evacuation, i.e.
>> inside the evac-in-progress- and cset-checks, but around the actual
>> evac. This needs to be done both in fast- and slow-path.
>>
>> There is a global counter that shows the number of threads inside the
>> evac-path, OR a special value (e.g. something negative) to indicate
>> OOM-during-evac (i.e. no threads are allowed to enter the path).
>>
>> Upon entry of the evac-path, any threads will attempt to increase the
>> counter, using a CAS. Depending on the result of the CAS:
>> - success: carry on with evac
>> - failure:
>> - if offending value is a valid counter, then try again
>> - if offending value is OOM-during-evac special value: loop until
>> counter drops to 0, then exit with read-barrier
>>
>> Upon exit, any threads will decrease the counter using atomic dec.
>>
>> Upon OOM-during-evac, any thread will attempt to CAS OOM-during-evac
>> special value into the counter. Depending on result:
>> - success: busy-loop until counter drops to zero, then exit with RB
>> - failure:
>> - offender is valid counter update: try again
>> - offender is OOM-during-evac: busy loop until counter drops to
>> zero, then exit with RB
>>
>> For Java threads, this protocol needs to be done in the fast
>> (assembly) path too, because they can cause evacs. Or else, we could
>> decide to disable the fast-path altogether (I was never really sure if
>> the extra assembly did us much good).
>>
>> GC threads don't have to protect every single evacuation, but can
>> instead do the protocol wholesale: i.e. enter on worker start, and
>> exit on worker done.
>>
>> Please think hard about this possible solutions and try to poke holes
>> into it. Meanwhile, I'll come up with a prototype.
>>
>> Cheers, Roman
More information about the shenandoah-dev
mailing list