RFR/RFC: Make OOM-during-evacuation race-free
Roman Kennke
rkennke at redhat.com
Mon Oct 23 16:31:47 UTC 2017
Am 23.10.2017 um 18:22 schrieb Aleksey Shipilev:
> On 10/23/2017 05:59 PM, Roman Kennke wrote:
>> Yes I see all that. But we have found out that this is a correctness issue, and that trumps
>> performance, even if it's just a very miniscule case.
> This is the correctness issue on the cancellation path again. And we have lots of band-aids there
> already, and this is yet another band-aid.
I would disagree with that. It is a fix. It makes the OOM race problem
disappear, and it also makes the write-barrier leaf-call issue
disappear. But yes, I am not arguing that it's fairly intrusive and
potentially performance-damaging. I'll try to get some gc-bench numbers.
> What makes it different from other band-aids is that it
> touches the code we *know* is performance critical. It is a nice exercise, but a band-aid
> nevertheless. Noisy performance data may lull us into believing the performance impact is okay.
>
>> If we can come up with another solution that makes running OOM-during-evac 100% I'm all for it. I'm
>> not fixed on my proposal, I just wanted to throw it out for discussion and bring something on the
>> table that we can do some performance tests with.
> This fwdptr mangling stuff is maybe our fallback plan, if, say, reservation scheme does not work
> itself out -- that makes the whole issue about cancellation going away.
>
> It makes little sense in my mind to allocate time for fallback plans that have bad theoreticals
> before we work out and try the fix that has good theoreticals. We are still at this stage in the
> project when we don't have to rush the intrusive band-aids out. We can actually take time to
> reimplement parts of the collector solving the issue "properly".
>
> I do wonder if instead of mangling the bits, we could reserve a "shadow" uncommitted memprotected
> heap, and set the fwdptr to that? Then we can intercept the SEGVs coming to that shadow heap, and
> redirect it to proper objects. This leaves the usual codepath the same, without ANDs, and the
> failure path would experience read storms -- but why would that matter, if we are on failure path?
That sounds very interesting too. I wonder how that redirection would
work though. I.e. how would you get the correct oop and patch it into
the failing code path and return...
In the meantime I'll extract the ShenandoahOOMDuringEvacALot part and
post it for RFR, this seems useful in any case and should not be
controversial.
Roman
More information about the shenandoah-dev
mailing list