RFC: One marking bitmap

Thu Oct 5 15:42:42 UTC 2017

On 10/05/2017 05:05 PM, Roman Kennke wrote:
> Am 05.10.2017 um 16:57 schrieb Aleksey Shipilev:
>> On 10/05/2017 04:53 PM, Roman Kennke wrote:
>>> Am 05.10.2017 um 16:51 schrieb Aleksey Shipilev:
>>>> On 10/05/2017 12:24 PM, Roman Kennke wrote:
>>>>> The main problem has been concurrent class unloading. When we do this, we unload classes that may
>>>>> still be referenced by unreachable objects in non-cset regions. When we try to iterate such a
>>>>> region
>>>>> object-by-object, we may hit such an object with a dangling Klass* and crash. Always having a
>>>>> valid
>>>>> bitmap around means we can iterate based on actual liveness info, which guarantees us to skip dead
>>>>> objects with dangling Klass*.
>>>> I do not understand this explanation. Actually, I don't understand how second bitmap avoids this
>>>> issue. One marking bitmap is *also* valid after class unloading (during final mark) had happened,
>>>> and we can iterate over it safely. Can you do the more verbose example?
>>> I probably missed the part where we would cancel marking and don't have a valid marking bitmap to
>>> support safe iteration that skips unreachable objects...
>> But if we *do* cancel the marking, we either follow to degenerate final mark that completes the
>> bitmap, or we slide to Full GC. Where's the issue? I need verbose example. And you probably need it
>> too! And anyone who would be reading this thread years later -- too!
> Ok, you're right. Sorry.
> The issue is, or at least was (not sure it is still the case), that we would not slide into full GC.
> Instead, we do an update-refs pass to fix up any references before sliding into full GC. This would
> crash because of dangling Klass*.

So I guess it goes like this.

As you said: suppose the cycle completes with class unloading, some objects are dormant with broken
Klass* are in the regions. Now, any iteration that consults the bitmaps is safe, because it will not
visit those dead objects. This includes update-refs phase and other marked_object_iterate users.

There are two complications:
 a) The code that walks the heap without taking care of bitmaps. This is problematic, because it
indeed can step on broken Klass*. But it also can step on broken oop fields that point nowhere, or
point to some garbage, or worse. So, it would seem such code is already broken, and we would need to
fix it, if observed;
 b) The code that walks the heap with bitmaps, but *during* concmark or *after* cancelled mark.
There, the bitmaps are incomplete: they don't include some objects that are alive, because concmark
had not visited them yet. This should not lead to crash, because incomplete bitmap would not give
you dead objects either way.

The noticeable examples for (b) are heap dumps, JVMTI heap walkers, and some of our debugging code.
For heap dumps, we can force Full GC to purge all dirty objects out, and thus guarantee safe
iteration. For JVMTI IterateOverHeap and friends triggering the Full GC is going to be more
complicated. This is where second bitmap helps to handle stuff better. Is there a way out without a
second bitmap?

Thanks,
-Aleksey