RFR/RFC: Make OOM-during-evacuation race-free
Roman Kennke
rkennke at redhat.com
Tue Oct 24 07:56:33 UTC 2017
So here comes some data. I ran gcbench read-tests on a server machine.
Plain reads, baseline:
http://cr.openjdk.java.net/~rkennke/safe-oom-during-evac/gcbench-old-plain.txt
<http://cr.openjdk.java.net/%7Erkennke/safe-oom-during-evac/gcbench-old-plain.txt>
Plain reads, patched:
http://cr.openjdk.java.net/~rkennke/safe-oom-during-evac/gcbench-new-plain.txt
<http://cr.openjdk.java.net/%7Erkennke/safe-oom-during-evac/gcbench-new-plain.txt>
Volatile reads, baseline:
http://cr.openjdk.java.net/~rkennke/safe-oom-during-evac/gcbench-old-volatile.txt
<http://cr.openjdk.java.net/%7Erkennke/safe-oom-during-evac/gcbench-old-volatile.txt>
Volatile reads, patched:
http://cr.openjdk.java.net/~rkennke/safe-oom-during-evac/gcbench-new-volatile.txt
<http://cr.openjdk.java.net/%7Erkennke/safe-oom-during-evac/gcbench-new-volatile.txt>
Roman
> On 10/23/2017 05:59 PM, Roman Kennke wrote:
>> Yes I see all that. But we have found out that this is a correctness issue, and that trumps
>> performance, even if it's just a very miniscule case.
> This is the correctness issue on the cancellation path again. And we have lots of band-aids there
> already, and this is yet another band-aid. What makes it different from other band-aids is that it
> touches the code we *know* is performance critical. It is a nice exercise, but a band-aid
> nevertheless. Noisy performance data may lull us into believing the performance impact is okay.
>
>> If we can come up with another solution that makes running OOM-during-evac 100% I'm all for it. I'm
>> not fixed on my proposal, I just wanted to throw it out for discussion and bring something on the
>> table that we can do some performance tests with.
> This fwdptr mangling stuff is maybe our fallback plan, if, say, reservation scheme does not work
> itself out -- that makes the whole issue about cancellation going away.
>
> It makes little sense in my mind to allocate time for fallback plans that have bad theoreticals
> before we work out and try the fix that has good theoreticals. We are still at this stage in the
> project when we don't have to rush the intrusive band-aids out. We can actually take time to
> reimplement parts of the collector solving the issue "properly".
>
> I do wonder if instead of mangling the bits, we could reserve a "shadow" uncommitted memprotected
> heap, and set the fwdptr to that? Then we can intercept the SEGVs coming to that shadow heap, and
> redirect it to proper objects. This leaves the usual codepath the same, without ANDs, and the
> failure path would experience read storms -- but why would that matter, if we are on failure path?
>
> -Aleksey
>
More information about the shenandoah-dev
mailing list