RFC: One marking bitmap

Thu Oct 5 10:24:38 UTC 2017

Am 05.10.2017 um 11:02 schrieb Aleksey Shipilev:
> Hi,
>
> I am trying to understand why do we need two marking bitmaps. "Next" bitmap is built during
> concurrent mark, and then gets swapped for the "complete" one at the end of concurrent mark. After
> that, all users poll "complete" bitmap for the actual data. But, there seem to be no users that need
> to do that when concmark is running, or they can poll "next" bitmap too!
>
> There are two problematic points I see:
>   a) with single bitmap, concurrent mark cannot be abandoned without additional action, because
> marking bitmaps may be incomplete on abort. But in our code, aborted concurrent mark leads either to
> degenerate final mark, or to full GC, where we finish building the bitmaps again;
>   b) with single bitmap, we need to clean the bitmaps *before* the init mark, and that means before
> setting TAMS -- which means is_marked() is unreliable in that time window. That seems not to be a
> problem, since nothing polls marked data when the concurrent cycle is initiated;
>
> This experimental patch cuts out one bitmap, and thus trims down our native footprint ~2x:
>    http://cr.openjdk.java.net/~shade/shenandoah/wip-one-bitmap/webrev.01/
>   (passes hotspot_gc_shenandoah)
>
> Thoughts? What do I miss?
>
> Thanks,
> -Aleksey
>
>
The main problem has been concurrent class unloading. When we do this, 
we unload classes that may still be referenced by unreachable objects in 
non-cset regions. When we try to iterate such a region object-by-object, 
we may hit such an object with a dangling Klass* and crash. Always 
having a valid bitmap around means we can iterate based on actual 
liveness info, which guarantees us to skip dead objects with dangling 
Klass*.

That being said, I'd be very much in favor of getting rid of the 2nd 
bitmap. It is 1/64th heap of additional storage, and that is significant.

But it also means we must ensure to never ever attempt to scan the heap 
after we aborted marking.

Which should be possible:

- Don't do an update-refs phase after aborted marking. Slide into 
full-gc and update references while traversing (!).
- Heapdump may need some additional thought, it must not happen between 
cancelled marking and full-gc. (Or trigger its own full-gc pass before 
dumping..)

Roman