RFC/RFR: Get rid of second bitmap

Tue Oct 10 09:20:49 UTC 2017

Am 10.10.2017 um 00:44 schrieb Roman Kennke:
> Am 06.10.2017 um 18:41 schrieb Aleksey Shipilev:
>> On 10/06/2017 02:27 PM, Roman Kennke wrote:
>>> AFAICT, The whole problem boils down to 
>>> ShenandoahHeap::object_iterate() and related *public*
>>> methods being problematic when called at random times, in particular 
>>> when the marking bitmap is not
>>> valid (e.g. marking aborted, bitmap just clearing/cleared, marking 
>>> in progress).
>> Yes, exactly.
>>
>>> We could help it by squeezing in a marking pass before doing the 
>>> iteration. However, if we do this,
>>> we can just as well report the visited objects to the ObjectClosure 
>>> while traversing. It shouldn't
>>> matter for consumers of object_iterate() in which order the objects 
>>> arrive, right?
>> The object order should not matter.
>>
>>> E.g. we can make all those methods do a safe 'iteration' by doing a 
>>> single-threaded marking pass,
>>> reporting objects while we go, using a single work stack, and using 
>>> 2nd marking bitmap (to avoid
>>> double-visiting objects) that we can allocate just for this purpose 
>>> and deallocate when done (after
>>> all, this should be a rare situation which is not 
>>> performance-critical). Right?
>> Yes, that makes sense. So this just makes another traversal through 
>> the heap, returning all
>> reachable objects. Yes, Verifier does that already, and it does not 
>> take much of the code. The
>> trouble with this approach is that we would need to test it 
>> separately, because it will exercise the
>> non-usual code path.
>>
>> Heap dump on OOME can also fail, because we would try to commit some 
>> native memory for bitmap at
>> that point.
>>
>>> I am assuming that all consumers would call object_iterate() during 
>>> a safepoint (need to check
>>> this, but I'm pretty sure this is the case). We'd also need to 
>>> ensure that we don't call those
>>> iterations ourselves from inside Shenandoah, unless we really want 
>>> to (e.g. verification?). And
>>> provide the fast iteration - marked_object_iterate() - to use 
>>> ourselves when we know that it is
>>> safe.
>> Verification uses neither object_iterate(), nor 
>> marked_object_iterate(), because it takes things
>> slowly, carefully, and on its own :)
>>
>> -Aleksey
>>
> I added a test that exercises JVMTI heap iteration excessively, and 
> lo-and-behold, it does crash spectacularily (even with 2nd bitmap). We 
> currently cannot do this with concurrent GC going on:
>
> - it would call ensure_parsability() which will plaster over TLABs 
> while we're evacuating (note that our GC threads don't participate in 
> safepointing, and we don't want to).
> - dealing with dead objects is difficult: they may have broken Klass* 
> (from previous concurrent class unloading) and broken oop refs.
>
> It is fixed in this proposal by implementing object_iterate() using a 
> marking traversal. It only commits an auxiliary bitmap when needed, 
> and uncommits it when done. It implements a very simple and dumb heap 
> traversal using 1 thread, 1 oop stack and 1 marking bitmap. It reports 
> only reachable objects, and that should be ok. It is only used for 
> non-GC use, mostly from JVMTI. SH::ensure_parsability() (the public 
> API) becomes a no-op. All linear heap scans are only done under our 
> own control (using marked_object_iterate()), and need to use the new 
> SH::make_tlabs_parsable().
>
> With this, we pass this new JVMTI heapdump test and all the other ones.
>
> http://cr.openjdk.java.net/~rkennke/onebitmap/webrev.03/ 
> <http://cr.openjdk.java.net/%7Erkennke/onebitmap/webrev.03/>
>
And here comes another small update that makes the test actually verify 
that it has seen *some* objects. The JVMTI call might otherwise return 
with an error code and we wouldn't notice.

http://cr.openjdk.java.net/~rkennke/onebitmap/webrev.04/ 
<http://cr.openjdk.java.net/%7Erkennke/onebitmap/webrev.04/>

Ok to push?