RFC: Safe OOM during evac
Roman Kennke
rkennke at redhat.com
Fri Oct 20 13:18:30 UTC 2017
Do GC workers update stuff in oops after evacuation? Then yes, we need
to make GC workers follow this protocol too. I haven't done this in my
prototype because GC workers during evacuation do not actually write to
the evacuated oops, and therefore don't care.
Roman
> Okay. Sounds like string dedup still has chances to update from-space
> oops without GC workers doing CAS-masking.
>
> But likely I will remove this UR phase to eliminate dependency on 2nd
> bitmap, anyway.
>
> Thanks,
>
> -Zhengyu
>
>
>
> On 10/20/2017 08:49 AM, Roman Kennke wrote:
>> I guess we could make it so.
>> Currently I'm only doing the CAS-masking trick on Java threads. We'd
>> have to do it for any thread (incl. GC workers).
>> And we'd have to make this the default behaviour.
>> Then yes, we could do that.
>>
>> Roman
>>
>>> Hi Roman,
>>>
>>> With this patch, we should not need "fixup_roots", right?
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>> On 10/20/2017 05:20 AM, Roman Kennke wrote:
>>>> Am 19.10.2017 um 12:27 schrieb Roman Kennke:
>>>>> Hi all,
>>>>>
>>>>> I want to outline the problem that we have with OOM during
>>>>> evacuation, and summarize what we have so far in order to handle
>>>>> OOM during evac correctly, and describe one way that I have in
>>>>> mind how to do it.
>>>>>
>>>>> The problem appears when a Java threads gets into a write-barrier
>>>>> and fails to evacuate the object because it's run out of memory
>>>>> (e.g. both GCLAB and shared evac exhausted). In this case we still
>>>>> need to ensure to return a singular object, even if it's in
>>>>> from-space, otherwise we risk inconsistency (subsequent write may
>>>>> end up in wrong object copy). However, there might still be
>>>>> another Java thread which succeeds to evacuate that same object at
>>>>> the ~ same time because it still has GCLAB left.
>>>>>
>>>>> Here's what we came up with so far in IRC discussions:
>>>>>
>>>>> - Throw OOME. This is the absolute minimum solution, and we should
>>>>> probably just do that right now until we implemented a better one
>>>>> (and we might even use it as fallback for solutions that are not
>>>>> 100% proof). This is better IMO than to pretend we're ok and risk
>>>>> heap inconsistencies.
>>>>>
>>>>> - Make write-barrier slow-path/runtime-calls non-leaf calls. Then
>>>>> we could just safepoint and do a full-GC *while we're in the
>>>>> barrier*. This would be to most correct solution. Unfortunately it
>>>>> means it would make it very hard to optimize the write-barriers in
>>>>> C2, and the performance impact is likely not acceptable. We may
>>>>> try to do a prototype (again) and see how far Roland can take it
>>>>> though. The problem here is that we need debug info at the call
>>>>> sites, and C2 maintains debug info only at certain points in the
>>>>> ideal graph. Consequently, we can move write-barriers only to such
>>>>> points and not as freely as we can do now.
>>>>>
>>>>> - Keep an evacuation reserve that we use only for evacuations or
>>>>> maybe even only for write-barriers or maybe even only as fallback
>>>>> for write-barriers that OOM'ed. This does very likely solve it for
>>>>> 99.999.. % of the cases, but discussions on IRC have shown that it
>>>>> is very hard to come up with a 100% safe upper bound for this
>>>>> reserve size, that allows us to theoretically prove that OOM
>>>>> during evac cannot ever happen. We might combine this with
>>>>> solution #1 though: i.e. make it safe in all but the most extreme
>>>>> pathologic cases, and throw OOME if we hit a wall. I am still not
>>>>> very happy with the prospect to fail in extreme rare cases,
>>>>> possibly in production environments under high pressure.
>>>>>
>>>>> - Extend the brooks pointer protocol to prevent concurrent evacs.
>>>>> Let me outline my idea here:
>>>>>
>>>>> If a write barrier runs OOM, we need to prevent other threads from
>>>>> successfully evacuating 'our' object. We can do so by CASing an
>>>>> 'impossible' value into its brooks pointers: this guarantees that
>>>>> other threads fail to successfully install a brooks ptr *OR* give
>>>>> us the other thread's copy (which would be fine too). Problem: we
>>>>> need to deal with that special value everywhere else, most
>>>>> importantly in read-barriers. The best thing I could come up with
>>>>> so far is to use $OBJECT_ADDR | 1 as blocker value, i.e. CAS-set
>>>>> the lowest bit in the self-pointing brooks ptr. This can easily be
>>>>> decoded in read-barriers (and all other relevant code) by masking
>>>>> out that lowest bit using AND ~1. Full-GC would fix the brooks ptr
>>>>> to normal value. I don't have a good feeling what the performance
>>>>> impact would be. Something similar happens for decoding compressed
>>>>> oops, and that is commonly accepted (but is less frequent). The
>>>>> actual brooks-ptr-load probably dominates the masking and we
>>>>> wouldn't even really notice? On the upside, this makes the
>>>>> oom_during_evac() path truly non-blocking: we don't need to wait
>>>>> for GC workers and not for other Java threads and not for
>>>>> evacuation to be turned off or any such thing. (which also means,
>>>>> it truly complies with being non-blocking for leaf-calls). I
>>>>> believe it's a correct solution too: no from-space copy can slip
>>>>> through it. I can imagine to come up with a prototype for this and
>>>>> make it optional (by a flag) so that we can measure its impact or
>>>>> even give the option to combine it with any of the other options
>>>>> we have (e.g. evac-reserve).
>>>>>
>>>>>
>>>> So, I made a prototype for this and SPECjvm with and without it.
>>>>
>>>> First with a clean checkout build:
>>>>
>>>> https://paste.fedoraproject.org/paste/8W8tKz5WGlvaT5iR5cUbFA
>>>>
>>>> And this with additional masking in the read barrier:
>>>>
>>>> https://paste.fedoraproject.org/paste/WT4TRm25gAbdsYSZfJqt4g
>>>>
>>>> First some things to notice:
>>>>
>>>> - compiler regularily crashes with an NPE.
>>>> -ShenandoahOptimizeFinals plus Roland's recent patch for this seems
>>>> to make it go away. I ran all benchmarks with that patch and flag
>>>> applied.
>>>> - serial's performance pattern is totally erratic with huge
>>>> variance between 3K and 12K. We can disregard this number and need
>>>> to look into it
>>>> - XML crashes hard inside a C2 compiled method
>>>>
>>>> other than that, I see no significant impact of the masking read
>>>> barrier.
>>>>
>>>> We also might want to run some memory-reading gcbench tests. In
>>>> case anybody wants to try that, here is the patch:
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/safe-oom-during-evac/webrev.00/
>>>> <http://cr.openjdk.java.net/%7Erkennke/safe-oom-during-evac/webrev.00/>
>>>>
>>>> If nobody screams stop, I'm going to add some test machinery
>>>> (+ShenandoahOOMDuringEvacALot) and additional testcases (probably
>>>> hook up to gcold and gcbasher), and then RFR/RFC the patch.
>>>>
>>>> Thoughts?
>>>>
>>>> Roman
>>>>
>>
More information about the shenandoah-dev
mailing list