RFC/RFR: Get rid of second bitmap

Fri Oct 6 12:27:50 UTC 2017

Am 06.10.2017 um 10:03 schrieb Aleksey Shipilev:
> On 10/05/2017 09:15 PM, Roman Kennke wrote:
>> There are known problems left. JVMTI might ask for a heapdump or heap traversal after marking has
>> been aborted and we don't have a valid bitmap to support heap scanning. This should now blow up
>> early with the above mentioned assert. We shall tackle this problem later IMO.
> I think we have to solve this before pushing. Because it may end up messy without the second bitmap,
> and we would have to revert the whole thing.
Ok, I thought about this for a little, while driving around ... this can 
be really helpful ;-)

AFAICT, The whole problem boils down to ShenandoahHeap::object_iterate() 
and related *public* methods being problematic when called at random 
times, in particular when the marking bitmap is not valid (e.g. marking 
aborted, bitmap just clearing/cleared, marking in progress).

We could help it by squeezing in a marking pass before doing the 
iteration. However, if we do this, we can just as well report the 
visited objects to the ObjectClosure while traversing. It shouldn't 
matter for consumers of object_iterate() in which order the objects 
arrive, right?

E.g. we can make all those methods do a safe 'iteration' by doing a 
single-threaded marking pass, reporting objects while we go, using a 
single work stack, and using 2nd marking bitmap (to avoid 
double-visiting objects) that we can allocate just for this purpose and 
deallocate when done (after all, this should be a rare situation which 
is not performance-critical). Right? I am assuming that all consumers 
would call object_iterate() during a safepoint (need to check this, but 
I'm pretty sure this is the case). We'd also need to ensure that we 
don't call those iterations ourselves from inside Shenandoah, unless we 
really want to (e.g. verification?). And provide the fast iteration - 
marked_object_iterate() - to use ourselves when we know that it is safe.

The alternative being that we would have to make all our fast paths 
aware of possible interruption, and make heapdumping/traversing code 
co-ordinate with bitmap cleaning and concurrent marking. I wouldn't want 
to go to that mess, if we can avoid it.

Does that sound sensible?

Roman