Fixing the OOM-during-evac

Roman Kennke rkennke at redhat.com
Wed Feb 28 11:42:55 UTC 2018


While implementing the prototype, I came upon an issue with the
protocol: if we get the OOM marker into the counter, we loose the
actual counter.

The solution is to not CAS a full special value, but mask the current
counter with an extra bit and handle/mask that accordingly.

Roman

On Wed, Feb 28, 2018 at 11:42 AM, Roman Kennke <rkennke at redhat.com> wrote:
> This issue keeps haunting me. :-)
> Over coffee, I had an idea how to solve it. Let me outline it and open
> for discussion.
>
> The issue is that when a Java thread hits OOM while in the
> write-barrier, another thread (Java or GC) may still succeed to
> evacuate the object. This is racy, because thread#1 may get a
> from-space copy and write to this, while other threads may get a
> to-space copy and write to that.
>
> We need to prevent any other thread from evacuating our failed-to-evac
> object, or else safely get the other copy.
>
> My idea is to have a counter for number of threads in the evacuation
> path, and as soon as we hit OOM there, wait until the counter drops to
> zero, at which point we can be sure to not get the object evacuated
> under our feet.
>
> We need to protect the evacuation path with the following protocol.
> 'The evacuation path' is the code around actual evacuation, i.e.
> inside the evac-in-progress- and cset-checks, but around the actual
> evac. This needs to be done both in fast- and slow-path.
>
> There is a global counter that shows the number of threads inside the
> evac-path, OR a special value (e.g. something negative) to indicate
> OOM-during-evac (i.e. no threads are allowed to enter the path).
>
> Upon entry of the evac-path, any threads will attempt to increase the
> counter, using a CAS. Depending on the result of the CAS:
> - success: carry on with evac
> - failure:
>   - if offending value is a valid counter, then try again
>   - if offending value is OOM-during-evac special value: loop until
> counter drops to 0, then exit with read-barrier
>
> Upon exit, any threads will decrease the counter using atomic dec.
>
> Upon OOM-during-evac, any thread will attempt to CAS OOM-during-evac
> special value into the counter. Depending on result:
> - success: busy-loop until counter drops to zero, then exit with RB
> - failure:
>   - offender is valid counter update: try again
>   - offender is OOM-during-evac: busy loop until counter drops to
> zero, then exit with RB
>
> For Java threads, this protocol needs to be done in the fast
> (assembly) path too, because they can cause evacs. Or else, we could
> decide to disable the fast-path altogether (I was never really sure if
> the extra assembly did us much good).
>
> GC threads don't have to protect every single evacuation, but can
> instead do the protocol wholesale: i.e. enter on worker start, and
> exit on worker done.
>
> Please think hard about this possible solutions and try to poke holes
> into it. Meanwhile, I'll come up with a prototype.
>
> Cheers, Roman


More information about the shenandoah-dev mailing list